maggimars / Tara-Phaeo

0 stars 0 forks source link

To read: #4

Closed maggimars closed 3 years ago

maggimars commented 3 years ago

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0097801

maggimars commented 3 years ago

Turns out I've read this one before (multiple times :)) - It was really good to be reminded of it in this context, though. I liked Figure 1 so tried some similar pie charts (excluding the tiny P. cordata transcriptome). I think the "unique" from the paper includes both the unassigned and species-specific categories in my plot (unassigned and species-specific would the single-copy and multi-copy unique groups). However, the core still makes up a smaller proportion of these transcriptomes/genomes than the transcriptomes in the paper. orthopie

maggimars commented 3 years ago

Next step, I think, is to try to annotate the orthogroups from the Phaeo transcriptome/genomes ...

halexand commented 3 years ago

I thought you probably would have read that paper (I think it was one of the first MMETSP papers). The core issue is interesting... and pretty dramatic. To clarify, do the pies above drop the rcc1383 transcriptome from the categorization? Because I guess from your figure in #2 I would expect there to be more core genes (but I guess ~1k out of 30k genes is about what these pies are showing).

Another broad (and likely obvious) thought is that as you add more species the size of the core gene set will decrease. Especially if we are adding partially complete transcriptomes to the set.

maggimars commented 3 years ago

I think you are right that I made a mistake here - there are 1984 core orthogroups when rcc cordata is excluded and I used that value for each species, but the rest of the values are numbers of genes not orthogroups. I will fix it and post the new plot!

maggimars commented 3 years ago

Also, I do agree about your second point - especially in regards to the transcriptomes not being very complete.