sile / libflate

A Rust implementation of DEFLATE algorithm and related formats (ZLIB, GZIP)
https://docs.rs/libflate
MIT License
180 stars 35 forks source link

Add `Lz77Decoder` #60

Closed sile closed 3 years ago

sile commented 3 years ago

Resolves #59

Skirmisher commented 3 years ago

I'm having trouble understanding how to use this, given a LZ77-encoded reader/buffer.

Also, since you're storing a Vec and an offset, would it not make more sense to use std::io::Cursor to wrap the Vec? You could possibly simplify some of the methods and the Read impl this way.

sile commented 3 years ago

First of all, thank you for checking this PR!

given a LZ77-encoded reader/buffer.

IIUC, LZ77 itself doesn't define serialization (binary) format, so you need to extract a sequence of libflate::lz77::Code from your reader/buffer by using a deserializing algorithm (e.g.., DEFLATE in libflate), and then pass the sequence to Lz77Decoder::decode(). (But I'm not an LZ77 expert, so I might have misunderstood something)

would it not make more sense to use std::io::Cursor to wrap the Vec?

Sounds good. Thanks!

Skirmisher commented 3 years ago

First of all, thank you for checking this PR!

You're welcome, though I didn't check much ^^;

IIUC, LZ77 itself doesn't define serialization (binary) format

Right, I was still slightly confused after referencing the document for the format I'm parsing (here if you're curious; relevant part is at about line 500, or just search "lz77"). I have a better idea of what I'm working with now, but your API is not obvious to me without documentation (e.g. with what pattern is the decoder created/used? when is truncate_old_buffer used and what for? etc.). Even reading the the way you've used it for DEFLATE is not enlightening since that code is also not commented and hard to follow.

At any rate, were you working on an implementation with Cursor or would you like me to...?

Skirmisher commented 3 years ago

Okay, I tried fiddling with the Lz77Decoder implementation and realized that Cursor doesn't actually simplify anything. However, the API still needs sprucing up to find more use outside this crate. For starters, I would recommend some sort of ring buffer for the sliding window instead of the unbounded Vec with manual garbage collection that exists currently. (MAX_DISTANCE would need to be passed in and possibly renamed; we could also make it a power-of-two exponent, so that wraparounds are always an AND instead of a modulus operation.) Would it be palatable to pull in a small crate for this, or is rolling the logic manually preferable?

I could envision an interface that requires the user to specify a function for decoding the LZ77 Code for a particular format, which allows us to provide a completely transparent Read impl over any given encoded reader. That might grow too complex, but I haven't looked into it yet so I'm not sure.

I realize a lot of this is shaking the cruft off of this code that was extracted from the DEFLATE module, but that's where the fun starts :) Apologies for asking so many questions about the library code, though it is inevitable.

sile commented 3 years ago

Thanks for your comments. I'm busy on weekdays, so let me respond that this weekend.

sile commented 3 years ago

the API still needs sprucing up to find more use outside this crate.

Agreed. And your idea sounds interesting, but, to be honest, there is no motivation to radically enhance the current LZ77 implementation because the current one is sufficient to libflate (implementing the idea could be exciting if I have time though). Therefore, I think, if just extracting the LZ77 part of this crate isn't enough for you, I recommend just stop considering using this, and try implementing a new LZ77 crate from scratch.

Skirmisher commented 3 years ago

I understand. The functionality is fine for me, I just want to understand your goals for the API in this PR, and how you intend it to be used. It seems like you just want to make a minimal functioning API from code extracted from the deflate module, at least for now, which is fine. In that case, what else do you want to add to the PR before moving it out of draft state?

sile commented 3 years ago

Sorry for the delayed response.

what else do you want to add to the PR before moving it out of draft state?

What I need to do is adding doc comments and unit tests. I'll do that within several days.

sile commented 3 years ago

Updated. I'll merge this PR once the CI passes.