minetest / contentdb

A content database for Minetest mods, games, and more
https://content.minetest.net
GNU Affero General Public License v3.0
93 stars 45 forks source link

Support .zip.gz (or comparable format) for package downloads #496

Closed Warr1024 closed 3 weeks ago

Warr1024 commented 7 months ago

Problem

Package downloads need to always be smaller. We have people who play on really terrible internet connections (cellular, satellite) because they have no other options, and speeds and reliability may be terrible, and they may pay exorbitant data costs. Every few kilobytes we can allow conscientious package authors to squeeze out of their downloads on behalf of users helps.

Solutions

Allow people to upload their packages as .zip.gz (or .zip.z). Minetest already supports the zip file format, and already supports zlib and its file formats, and could easily be extended to support these formats for contentdb downloads.

A comparison of a highly compressed zip (using advzip that runs zopfli at ~100 iterations) and a moderately compressed .zip.gz (using gzip -9, but no fancy brute-force compressors):

 1279986 nodecore_22090.zip
  989831 nodecore_22090.zip.gz

This works because the zip format compresses each file individually, and cannot take advantage of common patterns between files, while gzip works on an entire stream. It's trivial to get better compression with .zip.gz by just "compressing" the zip with zero compression (turning it into a "poor man's tar") and then compressing with gzip. Further improvements could be possible by strategically sorting the order of files in the archive before compression.

The only major reasons to use zip are portability/universality, and random seek capability. Both of these reasons are irrelevant if we're just going to always unpack the zip into individual files immediately, and we control both ends of the process.

Alternatives

Obviously, .tar.gz would make a lot of sense too. I'm just unsure whether Minetest supports tar already, and what would be involved in adding dependencies and whatnot if it didn't already. There's also a fair amount of POSIXiness to tar that Windows and other non-unixish platforms may not like as much. Zip looks weird inside a gz, but it might be the most practical thing to actually make work given what we already have.

Additional context

I already run advzip on all of my CDB packages CI'd using cdbrelease, and it takes a long time (delays package uploads by several minutes) but I'm well into the range of diminishing returns. Making a zero-compression zip and then gzip -9'ing it takes a small fraction of that time, and improves the output size significantly, and there's a lot of room for compression improvements beyond this by using some of the time freed up.

rubenwardy commented 7 months ago

This would require support in Minetest first. Releases using .zip.gz would then only support new Minetest versions

From the ContentDB side, disk space is at more of a premium than transfer quota. So transcoding all releases in both .zip and .zip.gz wouldn't be an option

Warr1024 commented 7 months ago

Then it sounds like unless CDB is willing to do recompression itself for legacy consumers, we may need to get the engine change in very early, and not be able to get much use of this until packages start EOSing the old versions that didn't support the new compression. At least some packages will support it early on, i.e. those that require latest release, but ones that support older versions may take a while to drop support.

Recompression on the server side may be a viable option, at least for the new format to the old format, as transparent compression is surprisingly inexpensive at default compression levels. That would mean that authors could start using the new format when the community reaches some "tipping point" and there is at least some net value for new version users.

Early on we may have to put limits on things like recompression or hosting both formats, e.g. only the latest N releases per package and only packages under like 5MB, so we can only make reasonably fast downloads faster, rather than making the huge package downloads possible for more users.

It's important to think in the long term, though, and supporting zip.gz or tar.gz would also align well with making server join times faster by supporting compressed media like b3d.gz, obj.gz, or tr.gz.

rubenwardy commented 3 weeks ago

Closing as unsupported by the Minetest engine. I've recovered a lot of disk space so may be willing to do reencoding, but needs engine discussion and support first