luizirber / niffler

Simple and transparent support for compressed files.
Apache License 2.0
75 stars 7 forks source link

Concatenated bz2 causes silent data loss on read #61

Closed Benjamin-Lee closed 1 year ago

Benjamin-Lee commented 1 year ago

Describe the bug

I'm dealing with some concatenated bz2 files. Reading using niffler only reads the first file.

Expected behavior

Let's say you have two FASTA files, hello.fasta and world.fasta. If you compress both to produce hello.fasta.bz2 and world.fasta.bz2 and concatenate the two bz2 files, reading that file using niffler will not read out any of the contents of world.fasta. If you were to concatenate hello.fasta and world.fasta before compressing, it would then work as expected.

Desktop (please complete the following information):

luizirber commented 1 year ago

Good catch! I tried it out with bzip2 and bzcat in the command line and got all info back from a concatenated bz2 file.

The fix might be as simple as using MultiBzDecoder in https://github.com/luizirber/niffler/blob/a29c1ef724396c9ba9c6ab76c8265aa0a588b8fe/src/basic/compression.rs#L82 , which would match what we do for gzip in https://github.com/luizirber/niffler/blob/a29c1ef724396c9ba9c6ab76c8265aa0a588b8fe/src/basic/compression.rs#L54