linkedin / migz

Multithreaded, gzip-compatible compression and decompression, available as a platform-independent Java library and command-line utilities.
BSD 2-Clause "Simplified" License
77 stars 12 forks source link

Can I use this to decompress bgzip compressed files. #2

Open wavefancy opened 4 years ago

wavefancy commented 4 years ago

Hi developers,

Can I use this to decompress bgzip compressed files? We have lots of files have been compressed by bgziped (http://www.htslib.org/doc/bgzip.html), but no way to decompress them in parallel, I hope your software can do this. Looked at the introduction, it seems very close.

Best regards Wallace

jeffpasternack commented 4 years ago

Hi Wallace,

In theory this is of course possible since BGZip is writing compressed data in multiple blocks which can then be decompressed in parallel, but unfortunately MiGz only supports multi-threaded decompression of MiGz-compressed data. One possible solution would be to recompress your data with MiGz (both BGZip and MiGz are gzip-decompressable).

Regards, Jeff

wavefancy commented 4 years ago

Hi Jeff,

Thank you very much for your quick response. However, bgzip is the actual standard in the bioinfomatics domain. People deliver bgzip sequencing files in default, and many are quite big. It is time-consuming to decompress and recompress by MiGz. On the other hand, there are lots of other tools that are working well with bgzip file, I am not sure MIGz compressed file can be compatible or not. So a great drop-in is that MIGz can support decompress bgzip in parallel. But I understand your time and efforts may not possible to support.

Thank you very much!

Best regards Wallace

wavefancy commented 4 years ago

But if you can support it, please let me know. Great appreciate!!! - Wallace

jeffpasternack commented 4 years ago

Hi Wallace,

If the tools specifically target BGZip (and aren't doing "normal" gzip decompression) then they indeed may not be able to read MiGz files. Normal gzip decompression tools, of course, will read MiGz files just fine. There are no plans to add BGZip decompression support (that would not be useful to us, since we can just compress everything with MiGz from the start), but you're welcome to submit a patch adding that support :)

Regards, Jeff