mhammell-laboratory / TEtranscripts

A package for including transposable elements in differential enrichment analysis of sequencing datasets.
http://hammelllab.labsites.cshl.edu/software/#TEtranscripts
GNU General Public License v3.0
212 stars 29 forks source link

TE gff3 to gtf? #33

Closed oronoc1210 closed 5 years ago

oronoc1210 commented 5 years ago

Hello,

I'm working with Sorghum bicolor, and both my gene and TE annotation files were obtained from Phytozome as gff3 files. I had no trouble converting the gene gff3 file to gtf format using gffread from the cufflinks suite, but I'm having a lot more trouble converting the TE gff3 file to gtf format -- gffread just returns an empty file.

My TE gff3 file looks like this: Sbicolor_313_v3.1.repeatmasked_assembly_v3.0.gff3.gz what would you recommend I do?

Thank you for your help!

666lixiaona commented 5 months ago

HI,

Thank you very much.I will deeply study it. Another question is how you understand the subfamily of TE?How to distinguish superfamily and subfamily of TE? What are their styles(or features) for subfamily of TE?

Thanks.

olivertam commented 5 months ago

Hi,

We chose to define subfamily as a distinguishable element as defined by repeat databases such as RepBase and Dfam. For example, AluYa5 is considered a subfamily, as it differs sufficiently from other Alu family members to be considered a separate TE. Again, these definitions are derived from repeat databases (which RepeatMasker uses for annotating UCSC genomes), and we're following their conventions.

Thanks.

666lixiaona commented 5 months ago

Hi,

I understand your meaning.So there is no obvious difference for subfamily and family of TE.If I want to research TE deeply,there is no need to distinguish superfamily,subfamily and family of TE.Right?

Thanks.

666lixiaona commented 5 months ago

Hi,

Another question is that in the publication you provide me,it refers to TE subfamily.So its meaning is that as long as TE has an obvious difference with its family,they all can be called subfamily.Right?

Thanks.

olivertam commented 5 months ago

Hi,

This is something that the field is still grappling with (see this article). In all honesty, our TE subfamily is defined as the distinct elements that are annotated in TE databases such as Repbase and Dfam, and thus we don't do our own annotations. Therefore, you will have to go to those databases to see how they define those elements as being distinct from each other.

Thanks.

666lixiaona commented 5 months ago

Ok,I get it.Thank you very much.

666lixiaona commented 5 months ago

Hi,

I want to ask you how the EM algorithum is applied in TEtranscripts.What is the hidden variable?Can you provide a specific process to help me understand?

Thanks.

olivertam commented 5 months ago

Hi,

You should be able to find the full details in the paper.

Thanks.