Closed GoogleCodeExporter closed 9 years ago
I've gone over this again - did I miss something about how the lengths
translate to encoded data? The >>8 actually appears to work just fine, which
makes brings encoding back to basically original speed. Using a >=oend anywhere
eventually causes a failure, though. A new patch and the extended fuzzer, which
checks for correct behavior when the output buffer is both exactly sized and
substantially too small, are attached.
Original comment by andrew.m...@gmail.com
on 24 Aug 2012 at 4:50
Attachments:
Hi Andrew
Thanks for the very detailed analysis.
Yes, this behavior is correct.
A fast encoder such as LZ4 has many trade-off everywhere in its code. It is
tuned for speed, and as a consequence, a lot of "approximations" (lost
opportunities to win a few bytes here and there) are accepted in favor of speed.
This behavior fits into this category :
Should the compressed data normally fit into XXX bytes, and is provided an
output buffer of exactly XXX bytes, then it may fail,
because its "write check" algorithm is only a fast approximation.
Obviously, a critical "write error" is not acceptable,
but missing an opportunity to compress data by a few bytes is : it just makes
the compression ratio worse. That's part of the trade-off.
The proposed approximation in r77 is fairly close to optimal : should the
output buffer be something in the range of XXX+N bytes, with N a very small
number, then the check will always work.
Now, of course, should a better check be possible without compromising speed, i
will gladly welcome it, and it will be integrated into the main source trunk.
Best Regards
Original comment by yann.col...@gmail.com
on 24 Aug 2012 at 8:59
I added tests for nearly enough space to the fuzzer. The more complex math is
needed on encoding lastRun to ensure that it doesn't run out of buffer, but
after some time compressing random data it appears the other two checks are
safe with the approximation. This patch has much better performance as a result
- only lastRun gets the complex check, and so far it never runs over buffer or
fails when the compressed output should fit.
Original comment by andrew.m...@gmail.com
on 24 Aug 2012 at 12:18
Attachments:
Great ! Thanks Andrew.
I'll look into it.
Original comment by yann.col...@gmail.com
on 24 Aug 2012 at 12:26
Please find in attached file
a proposed RC78
which takes into consideration your proposed modification
for exact output buffer check.
I've used this opportunity to also include your proposed fuzzer test tool,
which has been modified in order to become (hopefully) more portable.
Obviously, the proposed rc78 succesfully passes the fuzzer test.
Would you mind having a look, and tell if it works for you ?
Regards
Original comment by yann.col...@gmail.com
on 4 Sep 2012 at 2:08
Attachments:
The final test for 64K also needs to use (lastRun-RUN_MASK+255)/255 to avoid
overrun, or else use the >= condition and occasionally fail for the case of
exactly enough space. The fuzzer won't catch this failure as-is, but if you add
a compression that uses just sligthly under len bytes for output, it will
overrun.
It's a shame there's not an easily portable way to get time or some other data
to use as a seed for the PRNG, it's a big help to portability to put the PRNG
in the fuzzer, but with a static init every fuzzer run is using the same data.
Other than that and the extra test, the improved/portable fuzzer looks fine to
me - but I can only test on linux.
Original comment by andrew.m...@gmail.com
on 6 Sep 2012 at 9:01
Hi Andrew
I finally had a bit of time to work again on the issue.
The following zip package contains a few modifications which tries to solve
your latest comments :
- The fuzzer is upgraded, to test both :
==> Strict output size (which must work)
==> Output size - 1 (which must fails)
Both conditions must be verified for each loop of the test.
- Input seed can be either manually selected, or is automatically determined by
the program (not exactly random, but almost by the standard of this test).
- LZ4 64K end condition test has been modified according to your recommendation
With all these elements in place, i've been testing the updated LZ4 with the
fuzzer many times (with each time meaning ~250K tests), and all tests have been
succesfull so far.
It seems LZ4 now correctly detects output buffer boundaries precisely, instead
of being a bit over-cautious up to now.
Regards
Original comment by yann.col...@gmail.com
on 18 Oct 2012 at 8:08
Attachments:
Corrected into r79.
Thanks for clear report and detailed correction and test instructions.
Original comment by yann.col...@gmail.com
on 24 Oct 2012 at 1:03
Original issue reported on code.google.com by
andrew.m...@gmail.com
on 24 Aug 2012 at 4:00Attachments: