rvaser / bioparser

C++ library for parsing several formats in bioinformatics
MIT License
9 stars 5 forks source link

Implementing an htslib compliant parser would enable parsing of BAM #6

Open SamStudio8 opened 4 years ago

SamStudio8 commented 4 years ago

Currently bioparser only supports parsing of the SAM format as a tab-delimited file. Implementing a parser that is htslib-compliant would not only allow parsing of BAM as well as SAM (possibly even CRAM) formats, but would be a little more future proof and give free access to additional optional tags.

rvaser commented 4 years ago

Hi Sam, I'll consider this when I'll start refactoring Racon again :)

Best regards, Robert

SamStudio8 commented 4 years ago

Hi Robert, That would be great. As it happens, I've actually implemented a first attempt at this (but without your circular storage buffer, so the RAM usage is not that efficient). It supports SAM and BAM (without having to select which) as it uses hts_open. It runs in around the same time as the current SAM bioparser too. Let me know when you're refactoring next and I could make a PR or something.

Sam

rvaser commented 4 years ago

Will do!