pysam-developers / pysam

Pysam is a Python package for reading, manipulating, and writing genomics data such as SAM/BAM/CRAM and VCF/BCF files. It's a lightweight wrapper of the HTSlib API, the same one that powers samtools, bcftools, and tabix.
https://pysam.readthedocs.io/en/latest/
MIT License
786 stars 273 forks source link

Filter on arbitrary flags in pileup (feature request) #649

Open olavurmortensen opened 6 years ago

olavurmortensen commented 6 years ago

The stepper argument in the pileup method allows iterating through all reads ("nofilter"), or using two different predefined filters ("all" or "samtools"). The "all" option ignores reads with the any of the BAM_FUNMAP, BAM_FSECONDARY, BAM_FQCFAIL, BAM_FDUP flags. Would it be possible to allow the pileup method to filter based on any combination of any flags? For example, I would like to only ignore duplicate reads (BAM_FDUP, 0x400) and secondary alignments (BAM_FSECONDARY, 0x100).

bricoletc commented 9 months ago

I don't know since which version, but you can (now) use the flag_filter parameter to the pileup function:

flag_filter (int) – ignore reads where any of the bits in the flag are set. The default is BAM_FUNMAP | BAM_FSECONDARY | BAM_FQCFAIL | BAM_FDUP.

So something like

_ordered_flags = [
    "PAIRED",
    "PROPER_PAIR",
    "UNMAP",
    "MUNMAP",
    "REVERSE",
    "MREVERSE",
    "READ1",
    "READ2",
    "SECONDARY",
    "QCFAIL",
    "DUP",
    "SUPPLEMENTARY",
]
FLAGS = {key: 2**x for x, key in enumerate(_ordered_flags)}

[...]
pileup_columns = bam_fstream.pileup(flag_filter=FLAGS["DUP"] | FLAGS["SECONDARY"])