jackh726 / bigtools

A high-performance BigWig and BigBed library in Rust
MIT License
70 stars 7 forks source link

pybigwig's values() not reading outside chromosome #62

Open casblaauw opened 1 day ago

casblaauw commented 1 day ago

Hi, thanks for this great package - it's making our lives downstream much easier! I've been using pybigtools' BBIRead.values(), which says it can read regions even if (partially) outside chromosome boundaries (and indeed has the argument oob for that specific case).

When trying to extract values from a region exceeding the chromosome boundary, however, a specific assertion always fails:

with pybigtools.open("your_bw.bw") as f:
    f.values(chrom, start, end_outside_chrom, bins = bins)

Error (with Rust stacktrace on):

thread '<unnamed>' panicked at pybigtools/src/lib.rs:308:13:
assertion `left == right` failed
  left: 8
 right: 7
stack backtrace:
   0: rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::panicking::assert_failed_inner
   3: core::panicking::assert_failed
   4: pybigtools::BBIRead::values
   5: pybigtools::_::<impl pybigtools::BBIRead>::__pymethod_values__
   6: pyo3::impl_::trampoline::trampoline
   7: pybigtools::_::<impl pyo3::impl_::pyclass::PyMethods<pybigtools::BBIRead> for pyo3::impl_::pyclass::PyClassImplCollector<pybigtools::BBIRead>>::py_methods::ITEMS::trampoline
   8: method_vectorcall_FASTCALL_KEYWORDS
             at /usr/local/src/conda/python-3.11.9/Objects/descrobject.c:426:24
[...]

This seems to be because of an assertion in intervals_to_array (line 308):

assert_eq!(bin_end, array.len() - 1);

Which expects an array 1 smaller than the actual array's output. I can't quite figure out why this subtraction is here, since in my testing, bin_end and array.len() both match the expected size. For example, even with a region of 524288 bp and 32 bp bins, I get assertion left == right failed; left: 16384; right: 16383, with the expected size indeed being 16384.

Reproducible example:

import pybigtools
with pybigtools.open("https://github.com/jackh726/bigtools/raw/refs/heads/master/bigtools/resources/test/valid.bigWig") as f:
    chroms = f.chroms()
    # Region: arbitrary chromosome, 1 bin outside overlap
    chrom = list(chroms.keys())[0]
    bin_size = 8
    end = chroms[chrom]+bin_size
    start = end - 2*bin_size
    # Try to get values
    f.values(chrom, start, end, bins = 3)
    # Note: also happens if explicitly setting oob:
    f.values(chrom, start, end, bins = 3, oob=0)
jackh726 commented 1 day ago

Thanks for the report. I'll take a look!