mhammell-laboratory / TEtranscripts

A package for including transposable elements in differential enrichment analysis of sequencing datasets.
http://hammelllab.labsites.cshl.edu/software/#TEtranscripts
GNU General Public License v3.0
224 stars 30 forks source link

Question: Gene count being 0 in TEtranscripts Output #200

Closed vivi-1406 closed 1 month ago

vivi-1406 commented 2 months ago

In the output file TEtranscripts_out.cntTable, I am observing that while the TE counts are valid, the gene counts are all reported as zero. Here is my command: TEtranscripts -t "${treatment_files[@]}" -c "${control_files[@]}" --GTF "$gtf_genes" --TE "$gtf_te" --mode multi --outdir "$output_dir" Details: BAM Files: BAM files are sorted by -name, not by position. Runtime: The task ran for 4.8 hours and completed successfully. Could you please advise on possible reasons why the gene counts might all be zero? Are there specific settings or data format issues that could lead to this result? Any insights or suggestions to troubleshoot this issue would be greatly appreciated.

olivertam commented 2 months ago

Hi,

Thank you for your interest in the software.

The most common cause for this is the mismatch between the genome build of your gene GTF and the alignment. For example, in humans, if you align to hg38 from UCSC, but your gene annotation is from Ensembl, then you would notice that the chromosome names for the two are different (e.g. chr1 for UCSC, but 1 for Ensembl). As a result, there would be no overlap of your alignments with the gene annotation, and thus no counts.

If you want to provide the header of your BAM file:

$ samtools -H [BAM]

and the first 10 lines of your gene and TE GTF

$ head -n 10 [Gene GTF]
$ head -n 10 [TE GTF]

That might give me an idea if that is the cause.

Thanks.

vivi-1406 commented 2 months ago

Hi Oliver, Thank you so much for your quick and helpful response! You were spot on—the issue was indeed due to a mismatch between the chromosome names in the gene GTF and the alignment files. After making sure they were consistent, the gene counts are now being reported correctly. I appreciate your support and the work you’re doing with the software. Best.

github-actions[bot] commented 1 month ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days