Open paolobarbolini opened 3 years ago
@paolobarbolini Hello, thanks for reply. I want to add it to
bzip2-rs
, but it seems to be a breaking change. In my view, we should removeReadState::Eof
entirely, sinceDecoder
can never know that the end of file happened (it can only know that the end of block happened). Instead,DecoderReader
should check for the end of file in the original stream (which it doesn't do now, and actually it seems to me that it can get stuck in infinite loop of feeding zero bytes toDecoder
on some corrupted files which contain the header but end abruptly).In any case, I'm more interested in pure Rust crate, so I suggest to create an issue there and continue our discussion of this there if possible.
And btw, I found a place where it is handled in the reference implementation: https://gitlab.com/federicomenaquintero/bzip2/-/blob/master/bzip2.c#L455 If you change this line to unconditional
break;
, bzip2 starts to give identical result to these crates.
- https://github.com/alexcrichton/bzip2-rs/issues/75#issuecomment-826299753
Thanks for the detailed answer. I have a refactor that improves how Decoder
interacts with Block
. After landing that this feature should become easier to implement, so by that point we should be able to come-up with a good implementation for this.
OK, I'll probably do a quick patch of DecoderReader
anyway for my little project, and then we'll see how to merge it with that refactoring.
After thinking a little more I'm also not sure whether we should remove Eof
from Decoder
interface entirely or leave it messaging about the end of block to the user. The problem is, I'm not sure what is a supposed use case for Decoder
at all - is it needed in the public interface? Of course, the user may want to tamper with the buffer size of 1024, but this particular problem can be resolved by adding a setting to DecoderReader
, akin to setvbuf in stdio.h
library.
Support decoding multistream bzip2 files. Originally posted by @ygyzys at https://github.com/alexcrichton/bzip2-rs/issues/75