Closed maggimars closed 3 years ago
Turns out I've read this one before (multiple times :)) - It was really good to be reminded of it in this context, though. I liked Figure 1 so tried some similar pie charts (excluding the tiny P. cordata transcriptome). I think the "unique" from the paper includes both the unassigned and species-specific categories in my plot (unassigned and species-specific would the single-copy and multi-copy unique groups). However, the core still makes up a smaller proportion of these transcriptomes/genomes than the transcriptomes in the paper.
Next step, I think, is to try to annotate the orthogroups from the Phaeo transcriptome/genomes ...
I thought you probably would have read that paper (I think it was one of the first MMETSP papers). The core issue is interesting... and pretty dramatic. To clarify, do the pies above drop the rcc1383 transcriptome from the categorization? Because I guess from your figure in #2 I would expect there to be more core genes (but I guess ~1k out of 30k genes is about what these pies are showing).
Another broad (and likely obvious) thought is that as you add more species the size of the core gene set will decrease. Especially if we are adding partially complete transcriptomes to the set.
I think you are right that I made a mistake here - there are 1984 core orthogroups when rcc cordata is excluded and I used that value for each species, but the rest of the values are numbers of genes not orthogroups. I will fix it and post the new plot!
Also, I do agree about your second point - especially in regards to the transcriptomes not being very complete.
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0097801