Closed amisevsk closed 1 month ago
We'll start with an initial 1-day spike to see what's possible / fruitful and then go from there.
I've pushed a branch that can be used for testing: https://github.com/jozu-ai/kitops/tree/compression-opts
In this branch, kit supports a few compression options, specified via the --compression
flag for kit pack
:
--compression=none
--compression=gzip
--compression=zstd
--compression=gzip-fastest
Each format is reflected in the mediatype for the layer as expected and automatically handled by unpack.
From testing this briefly with ghcr.io/jozu-ai/llama-2
, my two main takeaways are
Full data (pack + unpack llama-2 7B quantizations with different compression options) | quantization | compression | time (pack) | time (unpack) | size |
---|---|---|---|---|---|
q4_0 | none | 4.44s | 1.94s | 3.5 GiB | |
zstd | 9.88s | 3.23s | 3.5 GiB | ||
gzip | 81.35s | 30.46s | 3.3 GiB | ||
gzip-fastest | 8.75s | 3.70s | 3.5 GiB | ||
q5_0 | none | 4.27s | 3.86s | 4.3 GiB | |
zstd | 11.91s | 3.84s | 4.3 GiB | ||
gzip | 96.99s | 32.27s | 4.3 GiB | ||
gzip-fastest | 9.60s | 3.85s | 4.3 GiB | ||
q8_0 | none | 6.98s | 6.03s | 6.6 GiB | |
zstd | 19.67s | 6.16s | 6.6 GiB | ||
gzip | 152.09s | 57.54s | 6.4 GiB | ||
gzip-fastest | 14.69s | 6.28s | 6.6 GiB | ||
f16 | none | 12.56s | 11.01s | 12.5 GiB | |
zstd | 33.37s | 21.31s | 9.6 GiB | ||
gzip | 247.29s | 99.93s | 9.6 GiB | ||
gzip-fastest | 111.36s | 98.14s | 9.5 GiB |
The above is testing the llama-2 (7B) model with weights stored in GGUF format; it's possible that other formats will allow different compression levels (though I doubt it, since the actual weights in a model generally look like random numbers)
Nice work. It doesn't look like there's any point in using gzip - the compressed sizes vs zstd are nearly identical and zstd is far faster. Should we simplify things with just a no compression and zstd option?
The main concern we had for zstd was that it's newer and the implementation isn't a standard, so a future update could change behavior/digests. Even in the current state, I've had to use the "Better Compression" option in order to replicate what you get with the official binary with default compression (which is a little strange).
This would mean that modelkit digests are reproducible (under zstd) only on the same version of kit
used to do the original pack -- I'm not sure that's a huge issue though. Different versions of kit already may produce different digests.
The discussion we were having around allowing different options was similar: if we allow e.g. none
and zstd
, then packing the exact same data could lead to two different digests. Again, I'm not sure this is a huge issue (we mostly care about retrieving the expected modelkit) but it's worth considering.
Also unpack
needs to be aware of the compression of the unpacked blob. Using the correct method for decompression.
That part is working on the branch (the mediatype identifies the compression format)
Opened a PR based on a stripped down version of the branch mentioned above:
Describe the problem you're trying to solve Modelkits with large files can take a long time to pack/unpack due to gzip being slow. We can speed this up but need to consider options carefully as changing the compression format will change model digests.
Describe the solution you'd like Choose another option for storage that is quicker
github.com/klauspost/compress/zstd
is a Go implementation of the algorithm, but abstracts over compression levels (in my testing levels here do not correspond to the C implementation)github.com/DataDog/zstd
is a wrapper on the C implementation, but requires CGO to build.Describe alternatives you've considered We could also make storage type configurable (gzip, no compression, zstd) (e.g. in the Kitfile). This would lead to a situation where the same modelkit data potentially packs into modelkits with different digests, though.
Additional context Add any other context or screenshots about the feature request here.