pysam-developers / pysam

Pysam is a Python package for reading, manipulating, and writing genomics data such as SAM/BAM/CRAM and VCF/BCF files. It's a lightweight wrapper of the HTSlib API, the same one that powers samtools, bcftools, and tabix.
https://pysam.readthedocs.io/en/latest/
MIT License
774 stars 274 forks source link

PileupColumn.get_query_positions() behavior differs from PileupRead.query_position #1248

Open jkoubele opened 9 months ago

jkoubele commented 9 months ago

I noticed that PileupRead.query_position differs from PileupColumn.get_query_positions() : ThePileupRead.query_position returns None if the PileupRead has set is_del or is_refskip. However, the PileupColumn.get_query_positions() for is_refskipcase apparently returns the next matched position.

I find this difference kinda unexpected , resp. it's the behavior of get_query_positions() that is highly unintuitive for me. I believe that the get_query_positions() behaves the same way as the underlying htslib library, and returning None in PileupRead.query_position is feature added by pysam? Would it maybe be useful to more explicitly describe the behavior of get_query_positions() in the documentation? (I somehow assumed that the positions are the same as if I would be calling PileupRead.query_position on every PileupRead in PileupColumn.pileups, which is not the case.)