pfalcon / uzlib

Radically unbloated DEFLATE/zlib/gzip compression/decompression library. Can decompress any gzip/zlib data, and offers simplified compressor which produces gzip-compatible output, while requiring much less resources (and providing less compression ratio of course).
Other
303 stars 82 forks source link

Move "clcidx" table to code when RUNTIME_BITS_TABLES (esp8266-specific workaround) #27

Closed earlephilhower closed 3 years ago

earlephilhower commented 4 years ago

To support RODATA-less environments (bootloader on ESP8266, for example), set clcidx via code instead of RODATA initialization when RUNTIME_BITS_TABLES enabled.

earlephilhower commented 4 years ago

This and #28 stem from the work we're doing on the ESP8266 Arduino bootloader to enable compressed app and filesystem uploads.

Thanks for the great libraries and work on the whole ESP toolchains!

pfalcon commented 4 years ago

Thanks for the patch, but I'm not sure I understand reasons/purpose of it. RODATA is just a way to implement storage for const arrays. If you don't have .rodata segment, just put such data to some other segment. But not having initialized data at all is unheard of, it's almost impossible to write a C program without it, have a global int a = 1;, and you already have it. Have a constant string like "foo", and you have it. So, I don't see how patching clcidx would change much, you likely need to tune your linker script instead.

earlephilhower commented 4 years ago

It's a little more complicated than that, I maybe wasn't quite clear.

We do have RODATA, but because this is for our bootloader it's actually situated in IRAM, a chunk of RAM which can only be accessed via 32b reads (i.e. instruction-fetch-width). Basically the ROM on the 8266 copies 4K of data from flash to IRAM and runs us. While it would be possible to include the equivalent of C startup routines for copying RODATA to RAM (like the real app loader), the code use winds up being more than the code it takes to store 18 bytes in RAM (nor IRAM).

pfalcon commented 4 years ago

Thanks for explanation, this clarifies it. But you're effectively proposing to add a hardware-specific workaround to generic code, intended to run on any hardware, and at the same time:

the code use winds up being more than the code it takes to store 18 bytes in RAM (nor IRAM).

Yes, but code in this patch also winds up more than without this patch for "normal" systems.

I very much understand the need for hardware-specific workarounds sometimes, but ... unless I need them personally, I don't fancy maintaining them ;-I. My overall suggestion is to leave this open, I'll ponder about it, and play when I have time.

Did you consider other options? E.g. are you aware of -mforce-l32 option for xtensa-lx106-elf- which is intended to exactly deal with cases like that transparently. For programmer. Not for the code. It will likely cost you bytes too. And that's the whole point - if you want the most minimal size for one platform, you would need to use assembly, and keep all pieces of it. This project tries to maintain the minimal size cross-platform, and again, I don't fancy penalizing all platforms by working around issues of one, a pretty niche one.

earlephilhower commented 4 years ago

I'm quite familiar with (your?) mforce-l32 patch, thanks. Had to port it to run on GCC 9.2 as part of my toolchain upgrade, in fact. The boot code unfortunately breaks the 4K barrier with it enabled, so that's not an option. Plus the compiler then makes all memory accesses a multi-step process...expanding a 15M filesystem image would probably take a goodly hit in runtime with force-l32.

I'll leave it open until you decide to close it, no problem. In the meantime we'll continue with my own fork of your repo with this and #28 applied. They're orthogonal of anything you're likely to change, so should be no trouble tracking.

Thanks for looking at this, though! Appreciate the effort.

pfalcon commented 4 years ago

I'm quite familiar with (your?) mforce-l32 patch, thanks.

Not mine, jcmvbkbc's one, so he was kind to implement it based suggestions from a few folks, including me.

expanding a 15M filesystem image would probably take a goodly hit in runtime with force-l32.

In our testing with MicroPython, we found the hit to be ~15%. Of course, on more synthetic workloads, it may be higher.

In the meantime we'll continue with my own fork

Sounds good, yeah. If you find some issue with generic algorithms here, or a way to improve them, please bring it up. Hardware-specific patches are indeed better go into old-style "CONTRIB" folder (well served by the patch tracker on the github).

earlephilhower commented 3 years ago

On the ESP8266 we just added RODATA support to the bootloader, so this patch isn't applicable anymore, even for us, so closing. Thanks again for a great library!