maggimars / Tara-Phaeo

0 stars 0 forks source link

Pie Charts with % core, shared, unique #7

Closed maggimars closed 3 years ago

maggimars commented 3 years ago

I think you are right that I made a mistake here - there are 1984 core orthogroups when rcc cordata is excluded and I used that value for each species, but the rest of the values are numbers of genes not orthogroups. I will fix it and post the new plot!

Originally posted by @maggimars in https://github.com/maggimars/Tara-Phaeo/issues/4#issuecomment-755841286

maggimars commented 3 years ago

It doesn't look hugely different, but I fixed the mistake I made before:

orthopie

maggimars commented 3 years ago

these are pie charts of genes in different categories - maybe a breakdown of orthogroups in different categories is more appropriate? or additionally interesting?

halexand commented 3 years ago

Well, for many of them it looks more on par with the other study in #4 :)

Some brief thoughts:

I think it would be interesting to get an idea (as with the other paper) of the % of genes in core that are annotated vs not.

And I think that perhaps looking at the common to antarctica but absent in others; common to globosa ... etc. might be interesting. I also think eventually we might start taking a look at some phylogenies from the core genes? Could be interesting.

Also: maybe trying to see how this type of abundance plays out across the Atlantic Tara data?

maggimars commented 3 years ago

True, most of them are on par with the transcriptomes in the other study- the P. cordata and P. jahnii transcriptomes are the two with the lowest proportion of genes in core orthogroups (but similar absolute number of genes in the core groups). I sequenced these two transcriptomes and the sequencing depth was much higher than the MMETSP sequencing and I used 100bp rather than 50bp sequencing. I think this is why there are more "genes" in these two transcriptomes than the others. I also sequenced the P.globosa 1528 transcriptome, but less aggressively.

I am starting to work on annotation and will upload % annotated plots when I am finished.

I agree about phylogenies, too. There are about 40 orthogroups that are single-copy in all the reference/transcriptomes - I was thinking to do a multi-gene phylogeny with the 40..

Last, I am not 100% sure what you mean by "trying to see how this type of abundance plays out across the Atlantic Tara data?" - can you expand on that thought?

halexand commented 3 years ago

True, most of them are on par with the transcriptomes in the other study- the P. cordata and P. jahnii transcriptomes are the two with the lowest proportion of genes in core orthogroups (but similar absolute number of genes in the core groups). I sequenced these two transcriptomes and the sequencing depth was much higher than the MMETSP sequencing and I used 100bp rather than 50bp sequencing. I think this is why there are more "genes" in these two transcriptomes than the others. I also sequenced the P.globosa 1528 transcriptome, but less aggressively.

This makes a lot of sense! Probably just need to incorporate that into your ultimate interpretation. It might make sense (now that we are talking about it) to include estimated BUSCO completeness for all the transcriptomes. It can be used to inform our interpretation.

The multigene phylogeny sounds like a great idea. Especially if we can compare it to some sort of 18S type tree (not sure what the standard protocol is with Phaeo land).

Re: half-baked, poorly articulated idea: I was thinking that for the upset plot instead of plotting the number of genes in the group potentially looking at the relative abundance in metaT / metaG space of each of the groupings (or perhaps normalized abundance for # of genes) for a given site. As in, sum the counts from the various genes into the groupings based on the figure in #2.