pysam-developers / pysam

Pysam is a Python package for reading, manipulating, and writing genomics data such as SAM/BAM/CRAM and VCF/BCF files. It's a lightweight wrapper of the HTSlib API, the same one that powers samtools, bcftools, and tabix.
https://pysam.readthedocs.io/en/latest/
MIT License
786 stars 273 forks source link

Support for modified base tags #1052

Open cjw85 opened 3 years ago

cjw85 commented 3 years ago

The Autumn update of htslib will include support for parsing modified base tags from alignment files: https://github.com/samtools/htslib/blob/develop/NEWS.

Wrapping the API for use from Python would be a useful addition to pysam. The two immediate use cases/interfaces that come to mind are:

The second is useful as a common goal is to count frequencies of modified/un-modified bases by position. I've actually already implemented this entirely in C (https://github.com/epi2me-labs/modbam2bed) and wrapped in a little Python, but tight integration in Pysam would be preferable for many.

cjw85 commented 2 years ago

@AndreasHeger Do you know if anyone has started work on this?

jmarshall commented 2 years ago

There is some work in #1061.

cjw85 commented 2 years ago

Thanks @jmarshall,

I only ask because I'm aware people have started using the little Python library I threw together, referenced above. I don't really want to encourage them too much, and would prefer to get things moving into Pysam. It my be time for me to learn Cython!

jmarshall commented 2 years ago

I have to admit, I have been very impressed by Cython. For people who know both Python and C, it is very easy to code in. (It is a different kettle of fish from Perl XS, which is like bolting assembly language onto the side of your Perl program.)

cjw85 commented 2 years ago

Don't remind me about PerlXS, you're giving me nightmares from when I wrote HDF5 extensions for @rmp.

AndreasHeger commented 2 years ago

Thanks, @cjw85 and @jmarshall . @jmarshall , I have set aside this evening to do a bit of pysam work. I was going to work on wrapping the new htslib release (1.15), but I understand you might already have started work? If so, I would then go through a couple of outstanding PRs such as this one.

jmarshall commented 2 years ago

Yes, I've got an import-1.15 branch just about ready to go.

Great to have you take a look at the modified bases PR, and thanks for your comments re htslib version compatibility over there. I can think of a few workarounds in pysam/*.pyx code that can be gotten rid of then!

AndreasHeger commented 2 years ago

@cjw85 , I have merged #1061, which allows access to the tags. Work on pileup functionality requires a bit more time. From the conversation, I gather it might be ok, but I wanted to check if I can freely borrow from modbam2bed?

cjw85 commented 2 years ago

Go for it most of the code was written for the C program, the Python code was an afterthought

AndreasHeger commented 2 years ago

Great, thanks. I was thinking of the C-code though...

cjw85 commented 2 years ago

Sure, I simply meant that you might have to adapt a bit to fit within pysam.