jts / nanopolish

Signal-level algorithms for MinION data
MIT License
568 stars 159 forks source link

Possible issue with htslib version #593

Open Shians opened 5 years ago

Shians commented 5 years ago

I helped a colleague debug an issue with nanopolish call-methylation, whereby a BAM file with no spliced alignments would be detected as having a spliced alignments, leading to the program stopping. The error is triggered here and the root cause is here where a CIGAR N op is detected.

However, when inspecting the read in Samtools, there is no N present, instead it looks something like 38000M3I.... Now when sticking printf statements into the nanopolish source, I find that nanopolish misreads the first two operations, both in terms of length and operation, with the second op being recognised as N. I BELIEVE this is due to the packaged htslib being over 2 years old, and that it has issues handling op_len longer than 2^15. Updating the htslib appeared to resolve the issue. Note that this is specifically the length of the CIGAR operation, not the the well-known and fixed issue CIGAR string.

Sorry I cannot provide better detail, this was debugged on someone else's machine. I also cannot provide a reproducible example because of my lack of experience with fast5 files, but I believe if you can create a situation where a CIGAR operation is > 2^15 in op_len then you'll find that nanopolish misreads it and corrupts the next operation. If more information is needed to debug this I'm happy to go hijack my colleague's machine for a few hours.

Latest versions of nanopolish and Samtools were used. Issue only occurs if using htslib @ 3dc96c5 included as subdirectory, pulling in the latest master and recompiling fixes the issue.

jts commented 5 years ago

Thanks for the detailed report and debugging - your explanation makes sense. I will update the htslib version.

Out of curiosity, how did you get an alignment with 38,000 matches and no insertion/deletions in-between?

Jared

Shians commented 5 years ago

I believe it was aligned back to assembled contigs, so the read itself would have informed the contig sequence.

jts commented 5 years ago

Ah, that makes sense. Thanks.

jts commented 5 years ago

Now updated to htslib-1.9

jts commented 5 years ago

I had to revert this change as there were some dependency issues with htslib-1.9. Reopening until I can fix it

NickyPan commented 5 years ago

@jts hello, I have the htslib-1.9 in the system. When I compiled the nanopolish, I got the following errors: error: unknown type name 'hts_itr_multi_t' error: unknown type name 'hts_reglist_t'

I guess these errors related to the version of htslib. How can I fixed it, or should I wait you to update to htslib-1.9?

jts commented 5 years ago

@NickyPan what does your make command look like, and can you give the full output from it?

I suggest letting nanopolish clone and install the built-in htslib.

NickyPan commented 5 years ago

@jts I just use the make without any parameter, and I got the following error:

make[1]:'Entering directory '/opt/seqtools/source/nanopolish/htslib' 
gcc -g -Wall -O2 -I.  -c -o hts.o hts.c
In file included from ./cram/cram_samtools.h:61:0,
                 from cram/cram.h:45,
                 from hts.c:39:
/opt/seqtools/include/htslib/sam.h:359:5: error: unknown type name 'hts_itr_multi_t'
     hts_itr_multi_t *sam_itr_regions(const hts_idx_t *idx, bam_hdr_t *hdr, hts_reglist_t *reglist, unsigned int regcount);
     ^
/opt/seqtools/include/htslib/sam.h:359:76: error: unknown type name 'hts_reglist_t'
     hts_itr_multi_t *sam_itr_regions(const hts_idx_t *idx, bam_hdr_t *hdr, hts_reglist_t *reglist, unsigned int regcount);
                                                                            ^
make[1]: *** [hts.o] Error 1
make[1]: Leaving directory '/opt/seqtools/source/nanopolish/htslib'
make: *** [htslib/libhts.a] Error 255

and then I tried the make HTS=noinstall . It worked. But I haven't run any text yet.