pwaller / go-clz4

fast cgo implementation of lz4
10 stars 1 forks source link

Unable to decompress using Reader #1

Closed ezbercih closed 10 years ago

ezbercih commented 11 years ago

It looks like the Reader interface does not decompress since the Read function creates an empty slice and passes it to Uncompress but Uncompress (based on comments on the function) needs a slice/array with capacity of decompressed data. Your tests does not cover Writer and Reader interface so that might be the reason for not catching it.

I am not sure how this could work without passing an extra parameter (size of data before compression) to NewReader so that it can be used to make a slice with that size and call Uncompress.

I also checked UncompressUnknownOutputSize function to see if I could use that but that also requires a slice/array with size of decompressed data or more.

pwaller commented 11 years ago

Good catch! You caught me, I didn't end up using the reader interface, so it's quite possible I never properly tested the Reader interface. However, the design is to read and decompress the entire underlying reader in one go. If you want to limit the amount of data to read in and be decompressed, you can use an io.LimitedReader or io.SectionReader.

With this extra fact, can you make your use case work?

I think generally you must somehow know the size of the input data to be decompressed. Firstly, I don't see a way we can feed partial input buffers to LZ4_uncompress (so that you could decode chunks at a time). Looking at the source I suspect if you give it an over-long input buffer it won't find the end of the compressed data.

I don't know a way around that. If you can think of one a contribution would be very much welcomed!

Cyan4973 commented 11 years ago

It's possible to provide an overlong input buffer to LZ4_uncompress(). The function will find the correct end of the block, and provide its position as the result of the function.

LZ4_uncompress() however cannot process an unfinished block. It will detect this situation, and return with a negative number, indicating an error.

To be sure to always find the end of the current block, you have to provide an input length of LZ4_compressBound(outputSize) to LZ4_uncompress().

Note from the source code (lz4.h) : LZ4_uncompress() : outputSize : is the original (uncompressed) size return : the number of bytes read in the source buffer (in other words, the compressed size) If the source stream is malformed, the function will stop decoding and return a negative result, indicating the byte position of the faulty instruction.

pwaller commented 11 years ago

ah, I didn't realise the implications of this. Would you like to have a go at writing a fixed version of the function with a test?

Cyan4973 commented 11 years ago

Unfortunately, i'm not a go developer (yet), and therefore can't help speedily

pwaller commented 11 years ago

Do you program in C? If so, write the algorithm in pseudo-C, it would be close. I don't have time to take a decent look right now.

Cyan4973 commented 11 years ago

hmm, note that LZ4_uncompress() doesn't need the size of input data, but needs the size of output data instead.

Looking at your source code, i'm not sure this condition can be fulfilled...

pwaller commented 11 years ago

Well, I import the LZ4 source directly from the c implementation, unmodified.

I think you might be able to feed in multi kilobyte chunks to lz4, and if it terminates prematurely, resume from wherever it got to.

They also seem to have added a streaming format recently, so maybe that might provide an avenue to implement this. Unfortunately though, someone who is not me will have to implement it if you want it in the next few months.

pwaller commented 10 years ago

I'm closing this for now. If anyone has anything to add or any proposals, please open a new issue (or comment here and I'll reopen it).