simon-anders / htseq

HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments.
https://htseq.readthedocs.io/en/release_0.11.1/
GNU General Public License v3.0
122 stars 77 forks source link

Pysam error #29

Closed davidecarnevali closed 7 years ago

davidecarnevali commented 7 years ago

Using HTSeq 0.6.1p2 and pysam 0.9 i got this error but only on some bam file from TCGA (aligned with STAR). The error arise both with python 2.7 and above (2.7.3). The error seems to be cause by reads like this one: UNC11-SN627_70:1:1101:3483:1995 69 0 0 * 0 0 NGGTGCTTTATTCTCCACAGAGTGATACATGCTAAGGTGGGTTGGGCTTG #:99:;<<<;D:7DDDDDDDDD@DDDDDDDDDDDDDDDDD6:DDDDDDDD NH:i:0 HI:i:0 AS:i:49 nM:i:0 uT:A:4 RG:Z::110325_UNC11-SN627_0070_AB02DVABXX.1

import HTSeq bamfile = HTSeq.BAM_Reader("file.bam") for read in bamfile: pass

Traceback (most recent call last): File "/u/home/a/annibal/project/scripts/SINEs_find.py", line 556, in main() File "/u/home/a/annibal/project/scripts/SINEs_find.py", line 529, in main cvg_bam_unstranded(bamfile) File "/u/home/a/annibal/project/scripts/SINEs_find.py", line 195, in cvg_bam_unstranded for read in file: File "/u/home/a/annibal/.local/lib/python2.7/site-packages/HTSeq/init.py", line 947, in iter yield SAM_Alignment.from_pysam_AlignedSegment( pa, sf ) File "python2/src/HTSeq/_HTSeq.pyx", line 1338, in HTSeq._HTSeq.SAM_Alignment.from_pysam_AlignedSegment (src/_HTSeq.c:29512) File "pysam/libcalignmentfile.pyx", line 1635, in pysam.libcalignmentfile.AlignmentFile.getrname (pysam/libcalignmentfile.c:18288) File "pysam/libcalignmentfile.pyx", line 699, in pysam.libcalignmentfile.AlignmentFile.get_reference_name (pysam/libcalignmentfile.c:9258) ValueError: reference_id -1 out of range 0<=tid<2779

I have tried to upgrade both HTSeq at 0.7.2 and pysam to 0.11.2.1 but the problem persists.

iosonofabio commented 7 years ago

Hi Davide, 0.7.2 has been out for a while. Closing Fabio

On May 25, 2017 10:28:32 AM PDT, davidecarnevali notifications@github.com wrote:

Using HTSeq 0.6.1p2 and pysam 0.9 i got this error but only on some bam file from TCGA (aligned with STAR). The error arise both with python 2.7 and above (2.7.3). The error seems to be cause by reads like this one: UNC11-SN627_70:1:1101:3483:1995 69 0 0 * 0 0 NGGTGCTTTATTCTCCACAGAGTGATACATGCTAAGGTGGGTTGGGCTTG #:99:;<<<;D:7DDDDDDDDD@DDDDDDDDDDDDDDDDD6:DDDDDDDD NH:i:0 HI:i:0 AS:i:49 nM:i:0 uT:A:4 RG:Z::110325_UNC11-SN627_0070_AB02DVABXX.1

import HTSeq bamfile = HTSeq.BAM_Reader("file.bam") for read in bamfile: pass

Traceback (most recent call last): File "/u/home/a/annibal/project/scripts/SINEs_find.py", line 556, in

main() File "/u/home/a/annibal/project/scripts/SINEs_find.py", line 529, in main cvg_bam_unstranded(bamfile) File "/u/home/a/annibal/project/scripts/SINEs_find.py", line 195, in cvg_bam_unstranded for read in file: File "/u/home/a/annibal/.local/lib/python2.7/site-packages/HTSeq/__init__.py", line 947, in __iter__ yield SAM_Alignment.from_pysam_AlignedSegment( pa, sf ) File "python2/src/HTSeq/_HTSeq.pyx", line 1338, in HTSeq._HTSeq.SAM_Alignment.from_pysam_AlignedSegment (src/_HTSeq.c:29512) File "pysam/libcalignmentfile.pyx", line 1635, in pysam.libcalignmentfile.AlignmentFile.getrname (pysam/libcalignmentfile.c:18288) File "pysam/libcalignmentfile.pyx", line 699, in pysam.libcalignmentfile.AlignmentFile.get_reference_name (pysam/libcalignmentfile.c:9258) ValueError: reference_id -1 out of range 0<=tid<2779 I have tried to upgrade both HTSeq at 0.7.2 and pysam to 0.11.2.1 but the problem persists. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/simon-anders/htseq/issues/29
iosonofabio commented 7 years ago

Wait, missed the end. You tried 0.7.2 and pysam 0.11.0?

iosonofabio commented 7 years ago

Have you tried Python 3? If not please do

davidecarnevali commented 7 years ago

The error is still there using python 3.1 isn't it a problem in read handling by pysam?

iosonofabio commented 7 years ago

3.1 not supported 3.4+ can you retry please?

On May 25, 2017 10:37:43 AM PDT, davidecarnevali notifications@github.com wrote:

The error is still there using python 3.1 isn't it a problem in read handling by pysam?

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/simon-anders/htseq/issues/29#issuecomment-304073186

davidecarnevali commented 7 years ago

Nope, same error

davidecarnevali commented 7 years ago

I figured out the error is raised by sam flag 69 and 141 Can you please check? Thank you

please see attached a simple bamfile with few reads among which those raising the error try.bam.zip

davidecarnevali commented 7 years ago

I tried to cotact the pysam developer but they answered me it is an HTSeq bug:

"This appears to be a bug/limitation of HTSeq and not pysam. The code here (https://github.com/simon-anders/htseq/blob/master/python2/src/HTSeq/_HTSeq.pyx#L1340) assumes that the read.mate_is_unmapped flag and mate contig (read.mrnm) are consistent and should either check that the read.mrnm isn't -1 before attempting to look up the reference or should handle the except that results when it is -1"

Please see their comment at https://github.com/pysam-developers/pysam/issues/472 Can you please fix it? Thank you

iosonofabio commented 7 years ago

had missed this one, will look into it asap

iosonofabio commented 7 years ago

fixed in dev branch, will see the light in release 0.9.0