gt1 / biobambam2

Tools for early stage alignment file processing
Other
93 stars 17 forks source link

Support for positions values up to 2^31 (=according to BAM spec) instead of 2^29 #66

Open WimSpee opened 6 years ago

WimSpee commented 6 years ago

Hi,

biobambam2 currently only supports position values up to 2^29 (= 536,870,912).

The BAM specification allows for position values up to 2^31 (=2,147,483,648 ) See page 5, section 1.4 The alignment section: mandatory fields https://samtools.github.io/hts-specs/SAMv1.pdf

I am currently processing alignment data via bcbio, which uses bamsormadup from biobambam2 for marking duplicate reads. The alignment data is from a species that has a chromosome of size 552.225.360

This is 20M above the 2^29 limit in biobambam2. Alignment with BWA mem works fine (as is expected from the BAM spec), but the duplicate marking with biobambam2 crashes with this error:

libmaus2::bambam::SamInfo::parseSamLine: invalid pos field 552223579
/data/prod/Tools/bcbio/1.0.8/anaconda/share/biobambam-2.0.79-0/bin/../lib/libmaus2.so.2(libmaus2::util::StackTrace::StackTrace()+0x55)[0x2b6245b661d5]

Googling this error leads to this line of code were the 2^29 limit seems to be defined.

https://github.com/gt1/libmaus/blob/master/src/libmaus/bambam/SamInfo.hpp#L706

Could you please upgrade this limit to 2^31 as is defined in the BAM specification. Thank you very much!

gt1 commented 6 years ago

Hi,

could you retry with the latest version?

WimSpee commented 6 years ago

Hi,

Thank you for the quick fix for this issue. I am using biobambam2 via bcbio. I need to figure out how to install biobambam2 together with the updated libmaus2 (I saw the Compilation of biobambam2 section, still need to do this)

This will be today or more probably early next week. Then I can check the command using biobambam2 now works.

Thank you much for the update.

WimSpee commented 6 years ago

hi @gt1 . I just tried with a fresh git clone and compile of both biobambam2 and libmaus2.

I now get this error: libmaus2::bambam::SamInfo::parseSamLine: invalid pnext field 552223813

Searching for this error in the code finds this line https://github.com/gt1/libmaus2/blob/9cbeb428e97bfd834b0ee20d9101dced8032482e/src/libmaus2/bambam/SamInfo.hpp#L754

This does seem the only place in the libmaus2 code where this comparison is done against a static cast of 2^29.

https://github.com/gt1/libmaus2/search?utf8=%E2%9C%93&q=static_cast%3Cint32_t%3E%281u%3C%3C29%29&type=

Could you please update that line of code? And maybe other lines if they still have the same cast / limit.

Thank you very much.

gt1 commented 6 years ago

Hi @WimSpee,

I have made some more updates. Could you please check whether this suits you?

WimSpee commented 6 years ago

Hi @gt1 . I made again a fresh git clone and compile of both biobambam2 and libmaus2.

Now position values up to 2^31 seem to be supported by biobambam2. At least running the bcbio BWA+biobambam2 pipe for more than an hour kept running and did not crash. I didn't have the time / resources to the let the pipe finish.

I'll ask at bcbio if they can also pull the updated version of biobambam2.

Thank you very much for the updates and also for biobambam2 in general.