mhammell-laboratory / TEtranscripts

A package for including transposable elements in differential enrichment analysis of sequencing datasets.
http://hammelllab.labsites.cshl.edu/software/#TEtranscripts
GNU General Public License v3.0
206 stars 29 forks source link

Error while reading bam files #35

Closed SalimMegat closed 5 years ago

SalimMegat commented 5 years ago

Hi,

I have trying to run TEtranscripts on my bam files and it gives this error which I do not quit understand.

screenshot 2019-02-15 at 07 37 21
SalimMegat commented 5 years ago

Here is the command I use. TEtranscripts --sortByPos --format BAM --mode uniq -t SRR6924192.bamAligned.sortedByCoord.out.bam -c SRR6924174.bamAligned.sortedByCoord.out.bam --GTF gencode.vM10.chr_patch_hapl_scaff.annotation.gtf --TE mm10_rmsk_TE.gtf

olivertam commented 5 years ago

Hi,

May I ask which version of TEtranscripts you are using? The error that you are reporting is related to the changes made to the samtools sort command line in pysam (v0.9 or above). The current version of TEtranscripts (v2.0.3) should address this issue. Please note that the latest version of TEtranscripts uses DESeq2 by default (instead of DESeq).

If you wish to keep using an earlier version of TEtranscripts (v1.x), then I would recommend pre-sorting the BAM files by query name using samtools sort, and then run TEtranscripts without the --sortByPos parameter. Please refer to this for more information.

Please let me know if the issue remains unresolved, and I will look further into it. Thanks.

SalimMegat commented 5 years ago

Hi, Thanks for your quick reply ! I have been using TEcount/transcript version 2.0.3 and I am still facing the same issue. (See above). Even with TEcount when I load 1 bam file at a time, reading the bam take forever and it is literally stuck at this step (= 27 min and not done). The bam files have been generated with STAR 2.6 and they look fine. Also, it seems like nothing is responding, I can't even kill the process. Many thanks !

Salim. screen shot 2019-02-15 at 1 38 21 pm

olivertam commented 5 years ago

Hi Salim,

Would you mind sharing short excerpts of the BAM files that you're using that is causing the sorting error (from the very top)? Could you also let me know the version of pysam that you're using?

Please note that the loading of the BAM files (which appears to be working) can take quite a long time, as it is processing and annotating/counting all the alignments. I would also recommend at least 10G of RAM when running TEtranscripts/TEcount. If you still do not see a response after 2 hours (and your BAM file is less than 10G), then you might want to determine if there is insufficient memory assigned to it.

You can either attach the file on github, or email me (tam at cshl dot edu) THanks