wdecoster / cramino

A *fast* tool for BAM/CRAM quality evaluation, intended for long reads
MIT License
127 stars 11 forks source link

thread 'main' panicked at 'index out of bounds: when running with `--karyotype` #18

Closed Adoni5 closed 1 year ago

Adoni5 commented 1 year ago

Command: cramino -t 8 --karyotype --checksum /bams/NA24143.sorted.bam - using latest v0.13.0 release

Output:

File name       NA24143.sorted.bam
Number of alignments    4363
% from total reads      99.73
Yield [Gb]      0.01
Mean coverage   0.00
Yield [Gb] (>25kb)      0.00
N50     7449
N75     3574
Median length   1570.00
Mean length     3267
Median identity 98.05
Mean identity   97.07
[test_bam.zip](https://github.com/wdecoster/cramino/files/13039744/test_bam.zip)

Path    /bams/NA24143.sorted.bam
Creation time   NA
Checksum        6F2F7CAE821EC794587D8E0AAD1AC578
thread 'main' panicked at 'index out of bounds: the len is 0 but the index is 18446744073709551615', src/calculations.rs:38:10
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Test file for reproduction: test_bam.zip

I've attached the BAM file and index. It's from mapping two FASTQ files from a human reference cell line, to HG38, with NO alt chromosomes, i.e chr1-22, X,Y Mt only.

The number 18446744073709551615 suggests some kind of unsigned overflow type issue, as it's the max u64 value, but is also equal to i64 -1.

https://github.com/wdecoster/cramino/blob/b054f4dec3ff2d732d100a449b3a6310bf5fc85a/src/calculations.rs#L38C29-L38C29

Looking at the line, I suspect when you calculate ind_left and ind_right, you are getting a usize, which when divided by 2 is 0, and minus 1 is overflowing back to the max value.

I will have a quick bash now and doing a checked divide, and if it fixes it will open a PR! Cheers, Rory