samtools / htsjdk

A Java API for high-throughput sequencing data (HTS) formats.
http://samtools.github.io/htsjdk/
283 stars 242 forks source link

Updated the max isize to be Integer.MAX_VALUE. #1410

Closed tfenne closed 5 years ago

tfenne commented 5 years ago

Description

I'm processing some data with a reference sequence that has long chromosomes, over the BAI limit. I ran into this issue where one (likely chimeric) read-pair has an insert size of over 536,870,912 (aka 1<<29), which causes HTSJDK to throw validation errors.

I looked at the spec and can find nothing supporting a limit on insert size this low. The SAM section defines it as between [−2^31 + 1, 2^31 − 1], and the BAM section dedicates a full int32 t for tlen.

codecov-io commented 5 years ago

Codecov Report

Merging #1410 into master will increase coverage by 0.006%. The diff coverage is n/a.

@@               Coverage Diff               @@
##              master     #1410       +/-   ##
===============================================
+ Coverage     68.102%   68.108%   +0.006%     
- Complexity      8373      8374        +1     
===============================================
  Files            573       573               
  Lines          33965     33965               
  Branches        5668      5668               
===============================================
+ Hits           23131     23133        +2     
+ Misses          8644      8643        -1     
+ Partials        2190      2189        -1
Impacted Files Coverage Δ Complexity Δ
src/main/java/htsjdk/samtools/SAMRecord.java 68.323% <ø> (ø) 297 <0> (ø) :arrow_down:
...htsjdk/samtools/util/nio/DeleteOnExitPathHook.java 90.476% <0%> (+9.524%) 4% <0%> (+1%) :arrow_up:
lbergelson commented 5 years ago

Seems likely to be an artifact of the bai restrictions, but the history of its creation is lost to git history.

lbergelson commented 5 years ago

Lets see if @yfarjoun can remember any objections too this. I can't think of any though.

jmarshall commented 5 years ago

Historical specification note: this field was initially neglected and only lifted from 229 to 231 a couple of years after most of the other fields (see samtools/hts-specs@6f8dfe41814df32de75bf16cf2719148c939ae3f). This may go some way to explaining the still-present HTSJDK limit on this field — but it was still lifted almost six years ago!

lbergelson commented 5 years ago

@jmarshall Thank you. Sometimes you just have to give things time, you don't want to rush anything.

tfenne commented 5 years ago

Thanks @jmarshall - I figured it was something like that and I appreciate the archeological help!

jmarshall commented 5 years ago

Tragically I still remember making that change :smile: