Improve pack and unpack speed

amisevsk commented 2 months ago

Describe the problem you're trying to solve Modelkits with large files can take a long time to pack/unpack due to gzip being slow. We can speed this up but need to consider options carefully as changing the compression format will change model digests.

Describe the solution you'd like Choose another option for storage that is quicker

No compression is fastest (obviously) but takes up the most space. This might not be a huge issue if the bulk of data in a modelkit isn't compressible anyways
Gzip on the fastest compression option appears (at first glance) to be a lot faster with no real loss in compression ratio
Zstd seems to be the best option in terms of size/speed, but the options for integrating it are limited:
- github.com/klauspost/compress/zstd is a Go implementation of the algorithm, but abstracts over compression levels (in my testing levels here do not correspond to the C implementation)
- github.com/DataDog/zstd is a wrapper on the C implementation, but requires CGO to build.

Describe alternatives you've considered We could also make storage type configurable (gzip, no compression, zstd) (e.g. in the Kitfile). This would lead to a situation where the same modelkit data potentially packs into modelkits with different digests, though.

Additional context Add any other context or screenshots about the feature request here.

bmicklea commented 2 months ago

We'll start with an initial 1-day spike to see what's possible / fruitful and then go from there.

amisevsk commented 1 month ago

I've pushed a branch that can be used for testing: https://github.com/jozu-ai/kitops/tree/compression-opts

In this branch, kit supports a few compression options, specified via the --compression flag for kit pack:

No compression, just a plain tar, specified by --compression=none
Gzip compression, our current default, can be directly specified via --compression=gzip
Zstd compression, specified by --compression=zstd
Gzip compression with the "fastest" compressor setting, specified by --compression=gzip-fastest

Each format is reflected in the mediatype for the layer as expected and automatically handled by unpack.

From testing this briefly with ghcr.io/jozu-ai/llama-2, my two main takeaways are

For many models, the weights are pretty much uncompressible; this is format dependent and largely depends on any metadata included in the file (e.g. f16 has significant room for compression, q4_0 does not)
Gzip is incredibly slow for uncompressible data; using the fastest compression level doesn't do much to change layer size and is generally 10x faster.

Full data (pack + unpack llama-2 7B quantizations with different compression options)	quantization	compression	time (pack)	time (unpack)
q4_0	none	4.44s	1.94s	3.5 GiB
	zstd	9.88s	3.23s	3.5 GiB
	gzip	81.35s	30.46s	3.3 GiB
	gzip-fastest	8.75s	3.70s	3.5 GiB
q5_0	none	4.27s	3.86s	4.3 GiB
	zstd	11.91s	3.84s	4.3 GiB
	gzip	96.99s	32.27s	4.3 GiB
	gzip-fastest	9.60s	3.85s	4.3 GiB
q8_0	none	6.98s	6.03s	6.6 GiB
	zstd	19.67s	6.16s	6.6 GiB
	gzip	152.09s	57.54s	6.4 GiB
	gzip-fastest	14.69s	6.28s	6.6 GiB
f16	none	12.56s	11.01s	12.5 GiB
	zstd	33.37s	21.31s	9.6 GiB
	gzip	247.29s	99.93s	9.6 GiB
	gzip-fastest	111.36s	98.14s	9.5 GiB

The above is testing the llama-2 (7B) model with weights stored in GGUF format; it's possible that other formats will allow different compression levels (though I doubt it, since the actual weights in a model generally look like random numbers)

bmicklea commented 1 month ago

Nice work. It doesn't look like there's any point in using gzip - the compressed sizes vs zstd are nearly identical and zstd is far faster. Should we simplify things with just a no compression and zstd option?

amisevsk commented 1 month ago

The main concern we had for zstd was that it's newer and the implementation isn't a standard, so a future update could change behavior/digests. Even in the current state, I've had to use the "Better Compression" option in order to replicate what you get with the official binary with default compression (which is a little strange).

This would mean that modelkit digests are reproducible (under zstd) only on the same version of kit used to do the original pack -- I'm not sure that's a huge issue though. Different versions of kit already may produce different digests.

The discussion we were having around allowing different options was similar: if we allow e.g. none and zstd, then packing the exact same data could lead to two different digests. Again, I'm not sure this is a huge issue (we mostly care about retrieving the expected modelkit) but it's worth considering.

gorkem commented 1 month ago

Also unpack needs to be aware of the compression of the unpacked blob. Using the correct method for decompression.

amisevsk commented 1 month ago

That part is working on the branch (the mediatype identifies the compression format)

amisevsk commented 1 month ago

Opened a PR based on a stripped down version of the branch mentioned above:

Drop zstd from options for now; we can revisit later if need be
Make 'none' the default options for pack

jozu-ai / kitops

Improve pack and unpack speed #257