Closed jts closed 1 year ago
Thank you for the very clear bug report. I can reproduce it. That's disappointing as I thought we'd recently bug fixed that function! I wonder if the bug fix itself broke something else.
Yes, this will most definitely point us in the right direction.
FWIW the bug I fixed before wasn't this, but a related issue elsewhere. The cause and fix are both trivial.
Many thanks for the clear bug report and data to reproduce it.
Thanks for the quick fix (and shoutout on twitter!), I tested the fix on the large dataset where I first noticed the problem and can confirm I get the expected result now.
Hi,
I recently ran into a strange problem where the output of samtools stats differed between two invocations that I expected to be equivalent. I was trying to generate the stats for a single chromosome and initially ran:
The results reported far fewer reads than expected, so I re-ran the command providing the full range for this chromosome and got the expected number of reads:
By definition these two range specifiers should be the same so I spent a bit of time looking into this and I think I tracked it down to a difference in what
cram_index_last
andcram_index_last_query
return. I've attached a minimal example (sim1.cram) to reproduce the problem. This is simply 500 full-length copies of a mtDNA sequence, with 1% errors introduced.With this .cram file
samtools stats sim1.cram chrM
reports 257 total reads for this chromosome butsamtools stats sim1.cram chrM:1-19154
reports 486.Here's a snippet of code to dump the
cram_index*
returned by the two methods:On sim1.cram this code prints:
which correspond to the first and second (last) entry in the .crai file for this chromosome (tid 5):
I don't know the cram format well enough to debug this any further but I hope this points in the right direction.
Jared sim1.zip