Feature Request: Support building LuCI with static brotli compression

tobiaspc commented 6 months ago

Problem

Page load speed is generally limited by the CPU power of the box and the speed of the network between the box and the client. LuCIs static files are currently served uncompressed. My goal is to improve the speed through brotli compression.

Solution

Pre-compress static files with modern compression algorithms such as brotli. While on-the-fly compression can already be done (with nginx), my testing showed that static compression is better. Dynamic even compression increases load times over fast links. If files could be pre-compressed during build, they could be served in their compressed form without any computational overhead on the OpenWRT box. This should be easily possible to implement even in uHTTPd.

Contribution

I am unsure on how to proceed, as I have no idea on where to start with OpenWRTs build system. I guess a configuration option would be ideal for that purpose. Brotli would be a compile time dependency needed for building.

Testing

I compared the following cases:

baseline (no compression)
dynamic (on-the-fly brotli level 6)
static (pre-compressed brotli level 11)
D6+L11 (combination of both)

All testing was done without any other load on the client, the router and the network. I accessed the reboot page, as it is mostly static. For static level 11, all js and css files in the www directory were compressed manually. All values are in milliseconds, mean of 6 runs. Webserver is nginx with HTTP3 enabled, running on APU2 box.

Results

Compression	# Requests	Page size	Transfer size
baseline	24	279 kB	280 kB
dynamic (L6)	24	279 kB	79 kB
static (L11)	24	279 kB	78 kB
D6 + L11	24	279 kB	75 kB

1000 Mbit/s LAN:

Compression	TTFB	load	DOMContentLoaded	Finish	of baseline
baseline	93.2	241.8	194.6	316.6	100.00%
dynamic (L6)	93.8	300	220.2	387.4	122.36%
static (L11)	87.8	240	188.6	298.6	94.31%
D6 + L11	91.6	233.8	192.2	302.4	95.51%

lan

720Mbit/s WIFI, Firefox "regular 3G throttling":

Compression	TTFB	load	DOMContentLoaded	Finish	of baseline
baseline	200	2562	1400	2918	100.00%
dynamic (L6)	232	897.2	541.6	1340	45.92%
static (L11)	203.2	824.6	529	1254	42.97%
D6 + L11	246.4	892.8	559.4	1358	46.54%

lan-slow

Conclusion

Over LAN, compression makes no big difference, while dynamic compression is even worse than baseline. Over simulated mobile networks, any compression cuts page load times in half, with static brotli compression outperforming all other options. Further testing with more dynamic pages (not just the reboot page) could provide additional insights. How to proceed?

Additional Information:

OpenWrt version information from system /etc/openwrt_release

DISTRIB_ID='OpenWrt'
DISTRIB_RELEASE='SNAPSHOT'
DISTRIB_REVISION='r26086-1b190dfd3a'
DISTRIB_TARGET='x86/64'
DISTRIB_ARCH='x86_64'
DISTRIB_DESCRIPTION='OpenWrt SNAPSHOT r26086-1b190dfd3a'
DISTRIB_TAINTS='no-all busybox'

systemcrash commented 6 months ago

Your numbers show a difference which is expected, but this approach means that two copies of resources will be around. And in the possible case that brotli support is removed from browsers, there are useless copies. Zstd is achieving broader acceptance and as a cpu factor in transparent transport compression is negligible.

Enabling brotli and zstd in the server engines does have merit. On the fly is preferred given the assumption of scarcity of storage space. Perhaps pre compressed is viable if targets have more than eg 128 mb flash.

Would you like to try and figure out how and make a PR?

tobiaspc commented 6 months ago

To clarify: I was thinking of a compile time option for self-compiled images. For regular, pre-build images, the overhead in space and webserver configuration complexity is probably not worth it. Replacing utthpd with nginx already increases image size a lot, compressing the whole www directory with brotli adds about 400kB.

Am I right to assume that uHTTPd development is more or less done, judging by the git history? Adding zstd/brotli support to uHTTPd does not seem to be realistic, but I might be wrong.

caniuse.com states 46% support for zstd encoding, whereas brotli is supported by 97% of the browser population. Compression ratio and speed seem to be comparable.

For nginx to support (dynamic) zstd compression, the respective module must be compiled and added to the image, which is not possible right now. I will try to make a PR for that, re-test and compare results.

Where would I start with adding a compile time option to pre-compress LuCIs css and js files with brotli or zstd during image building? First step would be adding a configuration option, then it would also be necessary to add the compression tool as a compile time dependency, right? Compression itself should be executed after LuCIs files are added to the image, but before the image itself is build. Can you give me a pointer to where that happens?

systemcrash commented 6 months ago

Take a peek in luci.mk and the respective Makefile for each theme. Some utils in /contrib.

systemcrash commented 6 months ago

uhttpd supports plugins. So it's possible to implement br, gzip and zst as plugins.

Rupurudu commented 6 months ago

Like @systemcrash said, having two copies of the resources will increase the space of the packages, this is not acceptable because there are targets with only 8MB of storage space.

A more interesting approach is to not have the uncompressed resources at all. Basically every browser made in the last 25 years should support gzip/deflate.

However, SquashFS compression might struggle with already compressed data, and this can increase the image sizes slightly. On the other hand, packages installed in the overlay will be considerably smaller.

Maybe it's possible to have uncompressed resources in the SquashFS and have gzipped resources in the overlay?

stokito commented 5 months ago

The SquashFS uses LZMA compression same as in xz. This is the slowest (it checks all permutations) but the most effective compression. Nothing can be more effective than it. So files in the image are small, but they do consume RAM when uncompressed and also slower to transmit. So the pre-compression is something that may help.

If you have many files close together, they'll be better compressed in the image because reusing the same dictionary. But to serve pre-compressed files, each file should be compressed separately, and its dictionary won't be similarly effective. This means that even if we use the same LZMA for each file the resulted SquashFS will be slightly bigger.

But the Brotli has a dictionary trained on web assets, and with this trick it indeed can compress to a smaller file.

Browser support

The LZMA (xz) isn't supported by browsers.
All browsers support the deflate and gzip (e.g. deflate + checksum). To create the raw deflate files use my deflate utility.
The Brotli is supported by all post IE6 browsers.
The zstd is not yet supported by Firefox. So it can't be used.

To compare their compression, I took the biggest file ui.js file and compressed:

brotli -k --best ui.js
xz -k --best ui.js
zstd -19 -k ui.js
pigz --best -nk ui.js
deflate -k9 ui.js

Result (ll ui.js*):

150986 ui.js
 28955 ui.js.br
 34461 ui.js.deflate
 34486 ui.js.gz
 30716 ui.js.xz
 30960 ui.js.zst

Nice result, and the Brotli was able to compress even 6% smaller than xz. Not such a big difference, but it is enough to not make images bigger than now.

But on smaller files the difference in bytes is small. For example, here is a medium-sized file from luci-app-uhttpd:

10958 uhttpd.js
 2414 uhttpd.js.br
 2868 uhttpd.js.deflate
 2882 uhttpd.js.gz
 2876 uhttpd.js.xz
 2799 uhttpd.js.zst

It is interesting that zstd and even deflate was more effective than xz. Not sure how this can be possible, maybe I should try other flags.

So I think we can try the Brotli but actually even gzip may be just fine.

Here I created a branch that you may test: https://github.com/stokito/luci/tree/precompress

Checkout it to folder and switch to the precompress branch.
Configure the feeds.conf and add a path to the Luci sources like src-link openwrt_luci /home/user/workspace/luci
./scripts/feeds update -a and ./scripts/feeds install -a
make menuconfig. Select the LUCI / Modules.
Don't Check the Minify JavaScript sources or the Minify CSS files.
Check ether Pre-compress to Brotli or Pre-compress to GZip.
Build with make -j4 where 4 is numbers of cores that you have.

You can compare the size of the resulted image yourself. I don't know how to compare properly.

If you check the Minify JavaScript sources or the Minify CSS files, then the resulted image will be broken. It looks like minification is performed in parallel with compression. Maybe the make -j1 should help. For a test you'll need to clear the luci build folder in bin/packages/your_target/openwrt_luci.

uhttpd pre-compressed First of all we need to make the uhttpd to serve pre-compressed files.

You may find a 9-years-old patch of @omonar for this.

Here I applied to the uhttpd but it didn't work and needs to be fixed. You can copy the raw patch and place it to package/network/services/uhttpd/patches/100-precompressed.patch and build an image.

Meanwhile, the BusyBox httpd already can serve the pre-compressed files, but Luci will start working on it only on the next release. So you may try to set to compile a busybox HEAD and try just today. Here is how to switch the BB to git https://github.com/stokito/openwrt/commit/d0db6ad6729ceb0e4a495f4871d4ef36db790fca You'll need to change the commit hash.

systemcrash commented 5 months ago

It's not much use to even discuss XZ which likely will never gain browser support. The idea is to have transparent transport compression, and most web content is continuously changing, and dynamic, if we regard the bulk of Internet traffic volume. Because XZ is so CPU intensive to do a single-shot compress, for content that is regularly changing, browser makers see no value-add in including XZ. Many users would perceive an increased lag in TTFP and TTFB. It's the antithesis. Zstd, however, is a much better result.

The patch seems a good starting point, provided the correct headers are served. It could be welcomed if it gets polished into plugins which can handle brotli and zstd. The drawbacks? You need to jam in libs and binaries to support those which increase your space usage. GZ is fairly low complexity and is implemented in a number of languages already within a small footprint. So it's currently the sweet-spot.

Where is the theoretical win? When the cumulative size of all compressed web resources for a default factory install plus winning (e.g. ztsd) library size are less than the size of the uncompressed resources.

If we supply a compressed blob, which resource is preferred if there are two copies? The one in the blob or the uncompressed one on disk? ( Think when you're developing for luci, and upload a newer copy ).

In general, when you try to compress an archive into another, you'll find the resulting (disk image) size often increases slightly. One could offset this by using dictionary support from Zstd - but it's not a given that the browser knows what to do with that.

stokito commented 5 months ago

Let's split two things: compression on the fly and pre-compression.

Compression on the fly is not complicated to add with a small footprint: the OpenWrt already have the zlib library for gzip/deflate. It's about 50 LOC.

The zstd is supported by Chrome which covers most users, it's library needs to be added but why to bother if the gzip will do the job relatively good? The zstd with a default level is about same as gzip but consumes less CPU so it may be useful only on big files but almost all big files anyway use some kind of own compression: docs (zip), media (mp3, mp4) etc.

I'm against to add even the 50 LOC of gzip on the fly simply because the uhttpd already has too big code base. It would be more sane to use the Lighttpd.

The pre-compression allows to use maximum compression level and doesn't need to install a compressor on the device. Given that all the Luci assets are static this is an ideal option for it.

If we supply a compressed blob, which resource is preferred if there are two copies? The one in the blob or the uncompressed one on disk?

Compressed, otherwise we need to make two stat syscalls and check mtime. This is how it works in the BusyBox httpd, for other servers I don't know.

During a development you can just remove the .gz version.

jow- commented 5 months ago

Precompression will most likely increase rootfs size. In what kind of scenario is it desired to increase utilization of extremely limited flash space to decrease the amount of asset data transferred to the browser client (typically via fast ethernet or wifi)?

The use case cited by the op (optimize page load times via mobile network) seems like an extreme niche use-case to me;

first of all we do not encourage accessing the ui via wan
if the ui should be accessed from the internet nonetheless it should happen via a secure vpn or tunnel
once some kind of vpn or tunneling is used (e.g. SSH tunnel), transport compression can be enabled there
if increased flash space usage is acceptable anyway, simply use nginx or lighttpd with on-the-fly compression

Personally I am against adding this facility as it will most likely never be used by default and thus not receive serious testing coverage while increasing the complexity and risk of accidental regressions for the standard case.

stokito commented 5 months ago

👍 Also when opening Luci in a browser not all assets are downloaded and once they downloaded they'll be cached. First loading of Luci on a slow connection will be slow but next should be faster.

I don't know how to properly measure the image size. In my case a build for x86 with a few luci apps enabled. I measured difference of the root.squashfs (ll openwrt/build_dir/target-i386_pentium4_musl/linux-x86_generic/root.squashfs).

The pre-compressed with Brotli is bigger on 38_645 bytes (38K). The pre-compressed with Gzip is bigger on 72_622 bytes (71K).

When started the image in VBox I tried to count size of files in memory with ( find /rom -type f -exec cat {} \; ) | wc -c:

19_785_227 uncompressed
19_085_891 brotlified e.g. smaller on 699_336 bytes (683K).
19_119_889 gzipped e.g. smaller on 665_338 (650K)

So the pre-compression gives some advantage in RAM but given that the resulted image is slightly bigger this may not work for everyone. Think yourself if this is needed for you. The patches are here.

Rupurudu commented 5 months ago

The pre-compressed with Brotli is bigger on 38_645 bytes (38K).

Theoretically the difference should be less than that if you also managed to minify the source files.

Meanwhile, the BusyBox httpd already can serve the pre-compressed files, but Luci will start working on it only on the next release. So you may try to set to compile a busybox HEAD and try just today. Here is how to switch the BB to git stokito/openwrt@d0db6ad You'll need to change the commit hash.

Interesting, I didn't know LuCI is switching to BusyBox httpd. Is there a reason why it's postponed to the next release? Current release is delayed because of the kernel 6.6, maybe it's possible to ship BusyBox httpd in the current release.

Also can we add this for installing packages to the overlay partition where it has both the RAM and space advantage? It would help people with small flash/low ram devices a lot.

stokito commented 5 months ago

I didn't know LuCI is switching to BusyBox httpd

no, the Luci is not going to use the BB http, sorry for confusing you. But soon it will be technically possible. In the next BB release there is a change that will parse URL of CGI differently. This may be useful for 4mb routers because the BB httpd is only 8Kb. I just wanted to say that you may try the precompression there, not only Nginx and others.

minify the source files

I compressed minified sources, so this is a limit. The loss is due to each file compressed separately with own dictionary, while in the image they compressed altogether more effectively with dictionary reusing.

It would help people with small flash/low ram devices a lot.

Agree, but usually routers have enough of RAM and the main limit is a disk space. A typical 4Mb device has 32Mb of RAM. So if users to need more RAM then they have a special case and the patch may indeed help them.

For a bigger devices it would be just easier to use the Lighttpd with mod_deflate and leave the uhttpd alone.

Rupurudu commented 5 months ago

Agree, but usually routers have enough of RAM and the main limit is a disk space. A typical 4Mb device has 32Mb of RAM. So if users to need more RAM then they have a special case and the patch may indeed help them.

For a bigger devices it would be just easier to use the Lighttpd with mod_deflate and leave the uhttpd alone.

While I don't own any 4/32 devices I do have a lantiq 8/64 device. Even with 8MB, the jffs2 overlay is so small that you can't install anything live with opkg, I had to integrate everything to the squashfs image. I thought it would make sense in that context that pre-compressed packages in the overlay would allow people to install 1-2 packages with opkg without needing to use image-builder.

For my other devices with large flash I actually use luci-nginx. It works out of the box with no manual config needed.

It will be technically possible. In the next BB release there is a change that will parse URL of CGI differently. This may be useful for 4mb routers because the BB httpd is only 8Kb.

This is actually a great idea, using BB httpd instead of uhttpd would both save precious flash space on small devices, while also relieve OpenWRT team from maintaining uhttpd.

I compressed minified sources, so this is a limit. The loss is due to each file compressed separately with own dictionary, while in the image they compressed altogether more effectively with dictionary reusing.

Maybe bundling all the default (luci-mod-admin-full) packages would solve that.

openwrt / luci