pfalcon / uzlib

Radically unbloated DEFLATE/zlib/gzip compression/decompression library. Can decompress any gzip/zlib data, and offers simplified compressor which produces gzip-compatible output, while requiring much less resources (and providing less compression ratio of course).
Other
303 stars 82 forks source link

Weakness in the uzlib_compress() algo #13

Closed TerryE closed 5 years ago

TerryE commented 6 years ago

Background

Hello Paul, I am currently the lead active maintainer / committer on the NodeMCU firmware project. (I've given a poke about this if you want to track the relevant issue). With my latest LFS patch, we can now execute Lua libraries directly out of flash, leaving the entire EPS8266 (and soon ESP32) RAM for data memory. The constraint is the the Lua library "overlay" (the Lua Flash Store) has to build built on host and so I am adding and OTA capability where I want to compress the LFS image on the host for downloading onto ESP and the refash function will expand this on the fly on the ESP.

So my development is matched to the following functionally:

The major constraint that I have is the ESP 8266 RAM: we typically have ~44Kb heap available for a starting application, but this will often be fragmented by the malloc system. The only way I could have a 32+Kb to support the default expansion would be to reboot the ESP and take over runtime control to perform dedicated expansion and I don't want to do this. I've therefore decided to drop to a 16Kb compression window for the dictionary. I'll give you a poke back when I have a fully tested implementation.

Issues with uzlib_compress()

The core issue is that your current repeat search scheme uses a 4K × pointer = 16Kb dictionary for single probe to find repeats.

  1. This gives O(n) compression but has weaknesses that can leading to poor compression performance. It also requires a 16Kb RAM hash table which makes it impractical to use on the ESP.
  2. You are mapping 32K entries onto a 4K single entry hash table, so the probability of a false hit can be high -- that is a valid dictionary string might be in the 32Kb window, but if its hash slot hash been overwritten by another string with the same hash, then the memcmp(src, subs, MIN_MATCH) will produce a mismatch, so the leading character will be treated as a literal.
  3. The hash table has LIFO characteristics so the *subs will (probably because of point 2) point to the most recent pattern with the same MIN_MATCH sequence and not necessarily the longest. Take the pathological case aaaBaaaCaaaBaaaCaaaBaaaCaaaBaaaCaaaBaaaC.... This will match the aaa repeats but not detect the single repeated aaaBaaaC string.

My instinct is that on a 2+Gz core, using a brute force O(n×m) algo would give acceptable runtime. Here n is the size of the input stream and m the size of the dictionary window. Note that n is smaller than the no of bytes as the full scan of the dictionary window is only needed once per literal byte or repeat group. It will certainly give higher compression performance, but I will use some hard test cases to evaluate compare the two variants.

I should have a working version to reference and some comparable performance stats in a week or so, and I'll post back a reference then.

Regards Terry

pfalcon commented 6 years ago

What you call "weaknesses" are actually genlz77's strengths - it's the most minimal (by the code size) lz77 compressor I could find (well, I couldn't, that's why I wrote my own impl), parametrizable for work memory usage, which still can compress something and has O(n) complexity (and small constant factor, because again, there's little code to execute). If you can improve on that without requiring more memory, not hurting runtime performance, and keeping code size growth contained (e.g. get 10% better compression for 3% increase in the code size) - please be my guest.

Exhaustive search and O(n*m) algos are of course out of question, as they are not competitive with the features listed above. If you need those, you'll be competing with some other compression implementation rather than uzlib/genlz77.

TerryE commented 6 years ago

@pfalcon Paul, IoT devices like the ESP8266 series run code from flash, albeit with a 1st level 32Kb instruction cache (which is also used for constant data fetches from mapped flash address space). So you have ~1Mb instruction / constant data space to play with and maybe 44Kb total available heap if you are lucky. This is pretty typical of IoT architectures including the ESP32 and the various ATmega architectures: instruction space is relatively plentiful and RAM is at an absolute premium.

My issue is that your design objective isn't aligned to the IoT architectures that I work with: for example a 32Kb hash table in RAM is a killer. Saving the odd 1K of code space is largely irrelevant.

On the IoT side, typical buffers to be compressed are maybe 2Kb long at most, so O(n*m) algos are quite practical on a 80MHz processor.

On the server size, on my 3 year-old coreI5 laptop, the O(n×m) algo for a typical luac.cross execution generating an LFS image adds less than a second compile time, and so is an acceptable hit. Sometimes Keep It Simple Stupid does work the best.

I am not saying that your implementation doesn't fit your target usecase, nor am I asking you to change it. I am just observing that it doesn't meet my needs and therefore out of courtesy giving you a heads up that I am going to implement an alternative that you might wish to track. I will offer this variant under similar permissive licensing so that if you feel it appropriate to add it as a #ifdef variant to your library, then you will be entirely free to do so.

I am deeply grateful for your toolchain work and for proving this library. The clarity of the implementation makes it an excellent jumping off point for my variant. Thank-you.

pfalcon commented 5 years ago

Let me close this by now. Patches fixing specific issues, substantiated by tests/benchmarks are always welcome. Caveat: you would need to contribute tests/benchmarks first.

In either case, please refrain from making claims which may border on the license violation. Good code or bade, good your intentions or bad, please avoid misrepresenting origin or authorship of the code. "My library" is not my library, it's library of code by Joergen Ibsen, Simon Tatham, and myself. String matching algorithm is a common algorithm for statistical string matching. Trying to lead discussion in ad hominem way isn't going to lead very far.

TerryE commented 5 years ago

Paul, thank you for the reminder to update you on this, and my apologies for not doing so.

I am not sure why you introduced the subject of ad hominems or implied that I wouldn't respect the terms of the licensing as I gave no suggestion of any intent to do so. My Issue with using this implementation with NodeMCU Lua on the ESP8266 IoT was purely a technical one and not a personal one: it doesn't fit in the RAM available.

Anyway if you want to look at the NodeMCU implementation, then please feel free to review this code (in our github repo) and its use within our Lua runtime (here). If you have any specific concerns that this breaches the copyright terms, then please let me know and I will take reasonable steps to reflect them.

pfalcon commented 5 years ago

I don't have any concerns as long as you didn't purposedly stripped copyright headers from source files, e.g. Simon's from https://github.com/pfalcon/uzlib/blob/master/src/defl_static.c#L6 . I trust you for not having done that.

Then please just don't call mine code his and his code mine (and in all other combinations of parties involved).

I'm happy that this library was useful for NodeMCU, like it was for other projects. Again, if you see any deficiency which you think worth fixing in this library so it benefited all projects, the patches are most welcome. (But to set expectations right, the patches should prove improvement, not just change.) If not, no worries, the project will slowly evolve to generalize and parametrize any features it has. (For me it's just one of many projects, and a background activity.)

TerryE commented 5 years ago

I don't have any concerns as long as you didn't purposedly stripped copyright headers from source files, e.g. Simon's from defl_static.c#L6

uzlib_deflate.c:L34, etc. was posted to the NodeMCU firmware project well long before your above comments that make the inference that I might be breaching copyright. This work was done openly both within this issue and on the corresponding NodeMCU one where I kept you posted (for example here). I personally feel it better to check first before implying that others could be breaching copyright -- especially as this is only a few clicks away.

please just don't call mine code his and his code mine (and in all other combinations of parties involved).

If you read the library README and file headers, then we will see that I have already clearly audited attribution of sources. My use of the personal pronoun (on this issue) was my shorthand for this library as a whole (after all, you state in it's README "Library integrated and maintained by Paul Sokolovsky").

My library is a fork, for the reasons I discussion in its README and inline comments: I need it to run in an available RAM footprint of less than 40Kb and I need to be able to expand large files (100s Kb) into Flash. Also as far as the compress algo goes, I decided to adopt the RFC 1951 strategy rather than your variant. If you want to use any of these ideas or code in your library then please feel free to do so. :smile: