Open cjw85 opened 3 years ago
@AndreasHeger Do you know if anyone has started work on this?
There is some work in #1061.
Thanks @jmarshall,
I only ask because I'm aware people have started using the little Python library I threw together, referenced above. I don't really want to encourage them too much, and would prefer to get things moving into Pysam. It my be time for me to learn Cython!
I have to admit, I have been very impressed by Cython. For people who know both Python and C, it is very easy to code in. (It is a different kettle of fish from Perl XS, which is like bolting assembly language onto the side of your Perl program.)
Don't remind me about PerlXS, you're giving me nightmares from when I wrote HDF5 extensions for @rmp.
Thanks, @cjw85 and @jmarshall . @jmarshall , I have set aside this evening to do a bit of pysam work. I was going to work on wrapping the new htslib release (1.15), but I understand you might already have started work? If so, I would then go through a couple of outstanding PRs such as this one.
Yes, I've got an import-1.15 branch just about ready to go.
Great to have you take a look at the modified bases PR, and thanks for your comments re htslib version compatibility over there. I can think of a few workarounds in pysam/*.pyx code that can be gotten rid of then!
@cjw85 , I have merged #1061, which allows access to the tags. Work on pileup functionality requires a bit more time. From the conversation, I gather it might be ok, but I wanted to check if I can freely borrow from modbam2bed?
Go for it most of the code was written for the C program, the Python code was an afterthought
Great, thanks. I was thinking of the C-code though...
Sure, I simply meant that you might have to adapt a bit to fit within pysam.
The Autumn update of htslib will include support for parsing modified base tags from alignment files: https://github.com/samtools/htslib/blob/develop/NEWS.
Wrapping the API for use from Python would be a useful addition to pysam. The two immediate use cases/interfaces that come to mind are:
The second is useful as a common goal is to count frequencies of modified/un-modified bases by position. I've actually already implemented this entirely in C (https://github.com/epi2me-labs/modbam2bed) and wrapped in a little Python, but tight integration in Pysam would be preferable for many.