pysam-developers / pysam

Pysam is a Python package for reading, manipulating, and writing genomics data such as SAM/BAM/CRAM and VCF/BCF files. It's a lightweight wrapper of the HTSlib API, the same one that powers samtools, bcftools, and tabix.
https://pysam.readthedocs.io/en/latest/
MIT License
773 stars 274 forks source link

Segmentation fault with very simple code #1279

Closed dbolser closed 4 months ago

dbolser commented 4 months ago

I was looking to fix the incorrect documentation here: https://pysam.readthedocs.io/en/latest/usage.html#working-with-tabix-indexed-files

(Just a couple of minor issues).

I thought I'd better run the latest version of the code from git, and I found that my 'simple test' turned up a segfault...

import pysam

tbx = pysam.TabixFile("tests/tabix_data/example_0v26.bed.gz")

for row in tbx.fetch("chr1", 1000, 2000):
    # OK
    print(str(row))

for row in tbx.fetch("chr1", 1000, 2000):
    # Gives "chromosome is c"
    print("chromosome is", row[0])

for row in tbx.fetch("chr1", 1000, 2000, parser=pysam.asTuple()):
    # Gives AttributeError: 'pysam.libctabixproxies.TupleProxy' object has no attribute 'contig'
    print("chromosome is", row.contig)

    # OK
    print("first field (chrom)=", row[0])

print("OK, I'm going to run this")
for row in tbx.fetch("chr1", 1000, 2000, parser=pysam.asBed()):
    print("here we go...")
    # Segmentation fault (core dumped) 😂
    print("name is", row.name)

print("Done!")

The output is:

$ python tests/test_simple_tabix.py 
chr1    1737    2090
chr1    1737    4275
chr1    1873    1920
chr1    1873    3533
chromosome is c
chromosome is c
chromosome is c
chromosome is c
first field (chrom)= chr1
first field (chrom)= chr1
first field (chrom)= chr1
first field (chrom)= chr1
OK, I'm going to run this
here we go...
Segmentation fault (core dumped)
dbolser commented 4 months ago

This happens on cdc0ed12fbe2d7633b8fa47534ab2c2547f66b84. I did the following to install:

git clone git@github.com:pysam-developers/pysam.git
cd pysam/
python -m venv .venv
source .venv/bin/activate
pip install -e .

I guess you need to know my libc and stuff? Is there a handy file I can attach?

dbolser commented 4 months ago
$ pip install -e .
Obtaining file:///home/dan/Build/pysam
  Installing build dependencies ... done
  Checking if build backend supports build_editable ... done
  Getting requirements to build editable ... done
  Installing backend dependencies ... done
  Preparing editable metadata (pyproject.toml) ... done
Building wheels for collected packages: pysam
  Building editable for pysam (pyproject.toml) ... \

done
  Created wheel for pysam: filename=pysam-0.22.0-0.editable-cp310-cp310-linux_x86_64.whl size=5106 sha256=522a5eaababdd2e259da9d1fbd7257a614a878444f9ae8ea1e6f14fd6e26cdb1
  Stored in directory: /tmp/pip-ephem-wheel-cache-pqsis2yg/wheels/cc/c5/bb/65151f5f844856e0716e962c6e8ead0aac4977a7bd496495c3
Successfully built pysam
Installing collected packages: pysam
Successfully installed pysam-0.22.0
jmarshall commented 4 months ago

The file _example0v26.bed.gz has three columns, which asBed() calls contig, start, and end.

Accessing fields that aren't present in the file is supposed to raise a KeyError. So printing row.score would have raised that exception, but printing row.name — the first absent field — crashes.

This is an off-by-one error in the presence checking that has been present in this code since 2011. Thanks for the easily reproducible bug report.

dbolser commented 4 months ago

Awesome! My pleasure! Thanks for the fix and for the great library!