simon-anders / htseq

HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments.
https://htseq.readthedocs.io/en/release_0.11.1/
GNU General Public License v3.0
122 stars 77 forks source link

samout: output bam files #62

Closed adomingues closed 4 years ago

adomingues commented 6 years ago

Hi all,

I am wondering if you could consider adding the option to output the annotated sam alignments in bam format (with header). This is would remove an intermediate step of converting the file to bamwhich is, afaik, the most common way to storing alignments these days.

Cheers, António

iosonofabio commented 6 years ago

Hi Antonio,

Thanks for your request, I'll have a look. In general the philosophy here is to not reinvent the wheel and SAM to BAM is certainly a trivial conversion. However there are corner cases with huge read numbers where compression is a must, so that would be useful.

All in all it should be doable since we require pysam already, however the SAM writer is very simple ATM so making it into a proper streamer might take some time.

Stay tuned Fabio

iosonofabio commented 4 years ago

I've added this option to the latest master. It is a little tricky to get right when parsing from standard input because if you miss the header you cannot go back, but I think I've got this. Closing for now, please reopen if you test it and it does not work properly,