mhammell-laboratory / TEtranscripts

A package for including transposable elements in differential enrichment analysis of sequencing datasets.
http://hammelllab.labsites.cshl.edu/software/#TEtranscripts
GNU General Public License v3.0
206 stars 29 forks source link

Problem with samtools #11

Closed rtmag closed 6 years ago

rtmag commented 7 years ago

Hello!

there seems to be a problem with samtools, any idea on what might be the problem?

Cheers!

INFO  @ Thu, 09 Mar 2017 03:03:27: Done building gene index ......

INFO  @ Thu, 09 Mar 2017 03:03:33:
Building TE index .......

INFO  @ Thu, 09 Mar 2017 03:13:14: Done building TE index ......

INFO  @ Thu, 09 Mar 2017 03:13:14:
Reading sample files ...

Error occured when reading first line of sample file /home/rtm/SJlab/deepa/bam/HCT_siK_Aligned.sortedByCoord.out.bam.
Error: 'samtools returned with error 1: stdout=, stderr=[bam_sort] Use -T PREFIX / -o FILE to specify temporary and final output files\nUsage: samtools sort [options...] [in.bam]\nOptions:\n  -l INT     Set compression level, from 0 (uncompressed) to 9 (best)\n  -m INT     Set maximum memory per thread; suffix K/M/G recognized [768M]\n  -n         Sort by read name\n  -o FILE    Write final output to FILE rather than standard output\n  -T PREFIX  Write temporary files to PREFIX.nnnn.bam\n  -@, --threads INT\n             Set number of sorting and compression threads [1]\n      --input-fmt-option OPT[=VAL]\n               Specify a single input file format option in the form\n               of OPTION or OPTION=VALUE\n  -O, --output-fmt FORMAT[,OPT[=VAL]]...\n               Specify output format (SAM, BAM, CRAM)\n      --output-fmt-option OPT[=VAL]\n               Specify a single output file format option in the form\n               of OPTION or OPTION=VALUE\n      --reference FILE\n               Reference sequence FASTA FILE [null]\n'
[Exception type: SamtoolsError, raised in utils.py:75]
olivertam commented 7 years ago

Hi. This looks like an issue with the updated pysam (>0.9) and samtools (>1.3) where the old samtools sort command no longer works. We are in the process of adding the ability to handle a newer pysam and samtools, and should hopefully get this into the code soon. Thanks, and apologies for the slow responses.

retrogenomics commented 6 years ago

Hi, I've got the same problem. I was wondering if you had found a work around since then? Thank you in advance

yingjin07 commented 6 years ago

Hi, there is a work around using the samtools to sort the bam files according to read name before running TEtranscripts. Thanks.

retrogenomics commented 6 years ago

It worked. Thanks

vasilislenis commented 6 years ago

@retrogenomics What do you mean it worked? I'm getting the same error. Could you please tell me how you fixed it? Thank you very much in advance.

retrogenomics commented 6 years ago

@vasilislenis It worked as suggested by @yingjin07, i.e. by using samtools to sort the bam files according to read names rather than by coordinates (samtools sort -@4 -O BAM -n file.bam -o file.sortedByReadname.bam) . Then you can run TEtranscripts without getting the error.

vasilislenis commented 6 years ago

@retrogenomics Thank you very much for your reply but unfortunately, it didn't work. I'm still getting the same error. I am using the SE testing files.

olivertam commented 6 years ago

@vasilislenis. What is your TEtranscripts command line? Once you use samtools sort, you should remove the --sortByPos parameter from the TEtranscripts command (that parameter calls samtools sort again, which assuming that you are using a newer version of samtools, might be causing the issue). Let me know if that does not resolve the issue. Thanks!

vasilislenis commented 6 years ago

@olivertam. Yes, you are right! I haven't excluded the --sortByPos command. Thank you!

Now I have a new error:

`INFO @ Thu, 18 Jan 2018 16:52:15: Finished processing sample files INFO @ Thu, 18 Jan 2018 16:52:15: Generating counts table CRITICAL @ Thu, 18 Jan 2018 16:52:25: Error in running differential analysis!

CRITICAL @ Thu, 18 Jan 2018 16:52:25: Error: [Errno 2] No such file or directory

CRITICAL @ Thu, 18 Jan 2018 16:52:25: [Exception type: OSError, raised in subprocess.py:1249]`

At least it generated the table counts which is all that I need. I was expecting to generate 2 different counts tables (one for coding and one for non-codding) but I found one with all the counts. Is it ok?

olivertam commented 6 years ago

@vasilislenis There should be only one count table (both coding genes and TE in one). You can easily separate them out by searching for the : in the name (which should only appear in TE). It is not immediately clear what your new error is (other than a missing file). Are you using the test data, or your own? If you don't mind providing the command line (you can replace any file names with placeholders if you prefer), that would be great. Thanks.

olivertam commented 6 years ago

@vasilislenis Also, was a [prefix]_DESeq.R file generated in addition to the counts table? Thanks.

vasilislenis commented 6 years ago

@olivertam. I am really sorry! I forgot to load the R module. Based on the log file everything went well. It was the test data, so now I will try with mine. I'll let you know how it goes. Thank you very much!

wjyzidane commented 6 years ago

I have the same problem: [Exception type: SamtoolsError, raised in utils.py:75]

so I sort them by name and re-run TEtranscript, but got another error:

If the BAM file is sorted by coordinates, please specify --sortByPos and re-run!

olivertam commented 6 years ago

Hi. If you could paste a copy of the log file for the second run, and perhaps the header and the first 10 or so alignments of the sorted BAM file, I can take a closer look. Thanks.

wjyzidane commented 6 years ago

Thanks!

Here is my file:

image

Here is the logfile: image

Jingyi

olivertam commented 6 years ago

Hi Jingyi, Looking at the BAM file, it is odd that it still looks like it is sorted by co-ordinates. I see chr1 for the first batch of alignments, and they appear to be arranged in numeric order (see column 2). You can also see that although the first 8 lines appear to be pairs of alignments, line 9 and line 12 appear to be a pair, but are separated by two other alignments. It might be worth double-checking the headers to see if the files are sorted correctly. The header should have a line that looks like the following: @HD ... SO:queryname and not @HD ... SO:coordinate Thanks.

wjyzidane commented 6 years ago

Thanks so much. It is indeed sorted by coordinate. Let me re-sort them again.

Have a good weekend. Jingyi

On Mar 30, 2018, at 3:30 PM, Oliver Tam notifications@github.com wrote:

Hi Jingyi, Looking at the BAM file, it is odd that it still looks like it is sorted by co-ordinates. I see chr1 for the first batch of alignments, and they appear to be arranged in numeric order (see column 2). You can also see that although the first 8 lines appear to be pairs of alignments, line 9 and line 12 appear to be a pair, but are separated by two other alignments. It might be worth double-checking the headers to see if the files are sorted correctly. The header should have a line that looks like the following: @HD VN1.4 SO:queryname and not @HD VN1.4 SO:coordinate Thanks.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mhammell-laboratory/tetoolkit/issues/11#issuecomment-377603529, or mute the thread https://github.com/notifications/unsubscribe-auth/AhU3K0pTq3AzrOdDZ3rPiD4mnudQofWQks5tjofrgaJpZM4MXoF2.

olivertam commented 6 years ago

Issue relating to the newer version of samtools/pysam should be addressed in TEToolkit v2.0.1. Note that pysam requirement is now v0.9.0 or higher