Open maggimars opened 3 years ago
Answers:
MMETSP - cleaned ones from Richter - clean data for proteins
pep AND contigs/transcripts .. not CDS
jgi - filtered model transcripts, filtered model proteins
outgroup - ehux 1516 genome + diatom genome
Is this what you meant by the cleaned MMETSP? https://figshare.com/articles/Data_from_The_evolution_of_silicon_transport_in_eukaryotes/12410606/1 @halexand ?
Ah, yes that looks similar. They are also buried on the imicrobe site. Sarah did some work figuring out how to bulk download them: https://github.com/shu251/download-cleaned-mmetsp I am not sure if they are the same but as long as we can site the dataset!
OK - these are the ones they used for EukProt (links are in the datasheet: https://figshare.com/articles/EukProt_a_database_of_genome-scale_predicted_proteins_across_the_diversity_of_eukaryotic_life/12417881/2 )
If it seems right - I can just grab the Phaeo ones from here without downloading the whole dataset
Ah, great! Yes, that seems right then! Makes sense to just download what you need :)
Regarding outgroups: JGI was down this week so I couldn't download the Ehux or diatom genomes. JGI will be back up tomorrow (Dec. 18). However, I got the feeling from the orthofinder tutorial (https://davidemms.github.io/orthofinder_tutorials/orthofinder-best-practices.html) that I might not want to include outgroups -- or I could at least compare results when outgroups are or are not included. Therefore, I went ahead with trying to run Orthofinder without outgroups.
Download Ehux and Chrysochromulina genomes and transcriptomes for comparison to Phaeo in mapping results. @rondorice
Gather data into
data
directory on Poseidon: