Pysam is a Python package for reading, manipulating, and writing genomics data such as SAM/BAM/CRAM and VCF/BCF files. It's a lightweight wrapper of the HTSlib API, the same one that powers samtools, bcftools, and tabix.
A class decorator to enforce tag conventions when declaring locally-defined tags .
Would these features be welcomed into pysam?
I am happy to implement these but would appreciate feedback on whether this is a contribution that would be accepted into pysam, and if so, on some design considerations before starting.
Thank you!
SAM tag enum
The primary question I have regarding a SAM tag enum is whether the member names should be the actual SAM tags, or more semantically meaningful?
e.g.
class SamTag(str, Enum):
"""Standard SAM tags."""
RG: "RG"
"""Read group."""
RX: "RX"
"""Sequence bases of the (possibly corrected) unique molecular identifier."""
or
class SamTag(str, Enum):
"""Standard SAM tags."""
READ_GROUP: "RG"
"""Read group."""
UMI: "RX"
"""Sequence bases of the (possibly corrected) unique molecular identifier."""
(note that I suggest mixing in str or subclassing StrEnum so the enums can be passed directly to pysam's tagging functions, e.g. read.has_tag(SamTag.UMI))
SAM tag decorator
To support locally-defined tags, I would propose providing an enumeration class decorator that implements the following validations:
Enforce uniqueness (using enum.unique)
Enforce that tags are two-character strings
Optionally enforce that locally-defined tags adhere to SAM convention, namely that tags start with "X", "Y", or "Z", or are lowercase
e.g.
@sam_tag(strict=True)
class CustomTag(str, Enum):
"""Custom SAM tags used for $project."""
FOO: "XF"
"""Foo."""
BAR: "XB"
"""Bar."""
I have a proof-of-concept for this feature that I'd happily open a PR for here, if it's a contribution that you think would be sensible to add to pysam
Hi,
I think it would be valuable to add two features to improve the use of SAM tags.
Would these features be welcomed into pysam?
I am happy to implement these but would appreciate feedback on whether this is a contribution that would be accepted into pysam, and if so, on some design considerations before starting.
Thank you!
SAM tag enum
The primary question I have regarding a SAM tag enum is whether the member names should be the actual SAM tags, or more semantically meaningful?
e.g.
or
(note that I suggest mixing in
str
or subclassingStrEnum
so the enums can be passed directly to pysam's tagging functions, e.g.read.has_tag(SamTag.UMI)
)SAM tag decorator
To support locally-defined tags, I would propose providing an enumeration class decorator that implements the following validations:
enum.unique
)e.g.