pierrec / node-lz4

LZ4 fast compression algorithm for NodeJS
MIT License
438 stars 98 forks source link

Possible bug (skipping last bytes) in uncompress function due to sIdx and eIdx #55

Open CanisLupus opened 7 years ago

CanisLupus commented 7 years ago

Doesn't the following code cause the end bytes of the input to be skipped if sIdx is not 0?

exports.uncompress = function (input, output, sIdx, eIdx) {
        sIdx = sIdx || 0
        eIdx = eIdx || (input.length - sIdx)
        // Process each sequence in the incoming data
        for (var i = sIdx, n = eIdx, j = 0; i < n;) {
        // ...

I had trouble with this until I noticed that eIdx is (input.length - sIdx) instead of input.length, causing the last "sIdx" bytes to not be processed.

Is this correct? sIdx and eIdx are not documented in the params above the function, but still. :)

pierrec commented 7 years ago

sIdx is the starting index (default=0) eIdx is the end index (default=length of buffer - start index)

If sIdx is supplied and eIdx is not, then all bytes from sIdx are processed. If sIdx is supplied and eIDx is as well, then all the specified bytes are processed. Of course, in that case the indexes must be valid for the block to be decoded.

I think the signature was originally defined as per the C version, but the indexes are never used. Should probably be removed. Want to take a stab?

CanisLupus commented 7 years ago

The indexes were useful for me, since I was trying to decode the mozlz4 format used on a few Firefox files and it uses a custom 8-byte header plus the data size field, all of which I had to skip, so I used sIdx only. Of course, it could be done by slicing the input externally, but this felt better. :)

But "length minus start index" is the part I'm not understanding. If we pass sIdx but not eIdx, the uncompress function will process input from sIdx to length MINUS sIdx, meaning that if we have an input of 100 bytes and try to skip the first 10, it will process not the remaining 90 bytes, but only 80, since the last 10 are not included. Additionally, if we try to skip the first 50 bytes, sIdx will get 50 and eIdx will get 100 - 50, which is also 50, so no processing is done (starts and ends at index 50).

I would also argue that the variable "n" is unnecessary, since it is a synonym of eIdx.

Am I missing something? I was decoding a semi-custom lz4 format, after all, so I might not be understanding how the real format is supposed to work. Maybe "n" or "eIdx" was supposed to be something else? Or maybe I'm just wrong. :)

pierrec commented 7 years ago

Ah I see, you are right indeed. I think that the intent was to specify none or both start/end indexes, not just one of them, in which case it is broken. Should just remove the start/end index and slice the input accordingly, it is much clearer IMHO.

CanisLupus commented 7 years ago

Glad I wasn't seeing things. :) If you want to remove them it's your call, of course. I'm okay with slicing the input myself in my use-case, though slice returns a shallow copy, while using the indexes doesn't create new arrays. It is also really easy to fix by removing the "- sIdx" (and maybe replace "n" with "eIdx"), but I understand either way.