mhammell-laboratory / TEtranscripts

A package for including transposable elements in differential enrichment analysis of sequencing datasets.
http://hammelllab.labsites.cshl.edu/software/#TEtranscripts
GNU General Public License v3.0
229 stars 30 forks source link

TE GTF File Zero Counts #145

Closed BenjaMcM closed 1 year ago

BenjaMcM commented 1 year ago

Hello,

I am running TEtranscripts with the "dm6_rmsk_TE.gtf" file provided by the Hammell Lab for the TE GTF. When I run TEtranscripts using this GTF, the results show zero counts for all TEs listed in the file. Out of curiosity, I also ran TEtranscripts with the "dm6_BDGP_rmsk_TE.gtf" provided by the lab and had no issues. I compared these two GTFs and the dm6 GTF uses a "chr" prefix to specify chromosome, while the BDGP GTF lacks this prefix. To test if this was causing the issue, I removed the "chr" prefix from the dm6 GTF and reran TEtranscripts. The results of this run showed successful counting of TEs. I'm not sure if others have run into this before, but I wanted to make sure the tool maintainers were aware.

Thanks!

olivertam commented 1 year ago

Hi,

Thank you for your interest in the software. Yes, this has tripped up many people and have led to many reporting the same "bug" as you have. It is one of the reason why we created both the dm6 and BDGP GTF files for Drosophila (and likewise for other organisms that have multiple chromosomal nomenclature).

Thanks.

BenjaMcM commented 1 year ago

Hi,

Thanks for the response! That makes sense, I was just going through the closed issues and saw this listed awhile back as well.

So using the provided BDGP GTF file on bams that were not aligned to a BDGP index should not lead to any mis-assignment of reads in this case?

Thank you again!

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days