sstadick / perbase

Per-base per-nucleotide depth analysis
MIT License
115 stars 13 forks source link

Panic during base-depth, caused by secondary alignments? #53

Closed da-i closed 2 years ago

da-i commented 2 years ago

Im trying to run perbase on some ONT data containing repeating DNA segments.

$perbase base-depth -z -D 1000000 -b ~/projects/variantcalling/depth/TP53_S0.bed --ref-fasta ~/Data/references/Homo_sapiens/T2T/chm13v2.0.fa tmp_sub.bam
[2022-07-07T13:19:54Z INFO  perbase::commands::base_depth] Running base-depth on: "tmp.bam"
[2022-07-07T13:19:54Z INFO  perbase_lib::par_granges] Using 14 worker threads.
[2022-07-07T13:19:54Z INFO  perbase_lib::par_granges] Creating channel of length 28185710 (* 120 bytes to get mem)
[2022-07-07T13:19:55Z INFO  perbase_lib::par_granges] Reading from "tmp.bam"
[2022-07-07T13:19:55Z INFO  perbase_lib::par_granges] Processing TID 0:0-248387328
[2022-07-07T13:19:55Z INFO  perbase_lib::par_granges] Processing TID 1:0-242696752
[2022-07-07T13:19:55Z INFO  perbase_lib::par_granges] Processing TID 2:0-201105948
[2022-07-07T13:19:55Z INFO  perbase_lib::par_granges] Processing TID 3:0-193574945
[2022-07-07T13:19:55Z INFO  perbase_lib::par_granges] Processing TID 4:0-182045439
[2022-07-07T13:19:55Z INFO  perbase_lib::par_granges] Processing TID 5:0-172126628
[2022-07-07T13:19:55Z INFO  perbase_lib::par_granges] Processing TID 6:0-160567428
[2022-07-07T13:19:55Z INFO  perbase_lib::par_granges] Processing TID 7:0-146259331
[2022-07-07T13:19:55Z INFO  perbase_lib::par_granges] Processing TID 8:0-150617247
[2022-07-07T13:19:55Z INFO  perbase_lib::par_granges] Processing TID 9:0-134758134
[2022-07-07T13:19:55Z INFO  perbase_lib::par_granges] Processing TID 10:0-135127769
[2022-07-07T13:19:55Z INFO  perbase_lib::par_granges] Processing TID 11:0-133324548
[2022-07-07T13:19:55Z INFO  perbase_lib::par_granges] Processing TID 12:0-113566686
[2022-07-07T13:19:55Z INFO  perbase_lib::par_granges] Processing TID 13:0-101161492
[2022-07-07T13:19:55Z INFO  perbase_lib::par_granges] Processing TID 14:0-99753195
[2022-07-07T13:19:55Z INFO  perbase_lib::par_granges] Processing TID 15:0-96330374
[2022-07-07T13:19:55Z INFO  perbase_lib::par_granges] Processing TID 16:0-84276897
thread '<unnamed>' panicked at 'index out of bounds: the len is 0 but the index is 826', /opt/conda/conda-bld/perbase_1654657865314/work/.cargo/registry/src/github.com-1ecc6299db9ec823/rust-htslib-0.38.2/src/bam/record.rs:1539:6
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

But if I run :

$samtools view -F 256 -b tmp_sub.bam > tmp_sub_nosec.bam 

$perbase base-depth -z -D 1000000 -b ~/projects/variantcalling/depth/TP53_S0.bed --ref-fasta ~/Data/references/Homo_sapiens/T2T/chm13v2.0.fa tmp_sub_nosec.bam

returns the expected output.

Alternatively:

samtools view -f 256 -b tmp_sub.bam > tmp_sub_sec.bam 

perbase base-depth -z -D 1000000 -b ~/projects/variantcalling/depth/TP53_S0.bed --ref-fasta ~/Data/references/Homo_sapiens/T2T/chm13v2.0.fa tmp_sub_sec.bam
...
thread '<unnamed>' panicked at 'index out of bounds: the len is 0 but the index is 4896', /opt/conda/conda-bld/perbase_1654657865314/work/.cargo/registry/src/github.com-1ecc6299db9ec823/rust-htslib-0.38.2/src/bam/record.rs:1539:6
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

So i think we can conclude that it fails due to secondary alignments.

sstadick commented 2 years ago

Thanks for the detailed issue!

This is odd. Would you be able to share the samtools view -f 256 -b tmp_sub.bam > tmp_sub_sec.bam bam file? Or even post a few anonymized secondary alignment reads?

If you are willing to exclude secondary alignments, perbase base-depth does have a -F flag that can be set to exclude them.

I'm pretty sure the index out of bounds is happening here where we try to get the base in the sequence, but for some reason htslib has no sequence for the secondary alignment?

sstadick commented 2 years ago

Closing due to inactive, but please re-open if it's an issue still.