maggimars / Tara-Phaeo

0 stars 0 forks source link

Gather data (available phaeo assemblies for reference) #1

Open maggimars opened 3 years ago

maggimars commented 3 years ago

Gather data into data directory on Poseidon:

maggimars commented 3 years ago

Answers:

MMETSP - cleaned ones from Richter - clean data for proteins

pep AND contigs/transcripts .. not CDS

jgi - filtered model transcripts, filtered model proteins

outgroup - ehux 1516 genome + diatom genome

maggimars commented 3 years ago

Is this what you meant by the cleaned MMETSP? https://figshare.com/articles/Data_from_The_evolution_of_silicon_transport_in_eukaryotes/12410606/1 @halexand ?

halexand commented 3 years ago

Ah, yes that looks similar. They are also buried on the imicrobe site. Sarah did some work figuring out how to bulk download them: https://github.com/shu251/download-cleaned-mmetsp I am not sure if they are the same but as long as we can site the dataset!

maggimars commented 3 years ago

OK - these are the ones they used for EukProt (links are in the datasheet: https://figshare.com/articles/EukProt_a_database_of_genome-scale_predicted_proteins_across_the_diversity_of_eukaryotic_life/12417881/2 )

If it seems right - I can just grab the Phaeo ones from here without downloading the whole dataset

halexand commented 3 years ago

Ah, great! Yes, that seems right then! Makes sense to just download what you need :)

maggimars commented 3 years ago

Regarding outgroups: JGI was down this week so I couldn't download the Ehux or diatom genomes. JGI will be back up tomorrow (Dec. 18). However, I got the feeling from the orthofinder tutorial (https://davidemms.github.io/orthofinder_tutorials/orthofinder-best-practices.html) that I might not want to include outgroups -- or I could at least compare results when outgroups are or are not included. Therefore, I went ahead with trying to run Orthofinder without outgroups.

maggimars commented 3 years ago

Download Ehux and Chrysochromulina genomes and transcriptomes for comparison to Phaeo in mapping results. @rondorice

maggimars commented 3 years ago

Micromonas