nix-community / nix-index

Quickly locate nix packages with specific files [maintainers=@bennofs @figsoda @raitobezarius]
Other
817 stars 50 forks source link

Consider a different compression level for Zstandard #164

Open colinxs opened 3 years ago

colinxs commented 3 years ago

The default is currently 22, but it looks like it recent versions it's been reduced to 19. Also, level 22 doesn't yield a compression ratio much better than the default (-3), but takes substantially longer. Here's the results for compressing an uncompressed tarball of nixpkgs:

nixpkgs.tar          : 19.68%   (111411200 => 21922085 bytes, 3.zst)
nix run nixpkgs#zstd -- nixpkgs.tar -${i} -o ${i}.zst  0.45s user 0.05s system 109% cpu 0.458 total
nixpkgs.tar          : 19.62%   (111411200 => 21862082 bytes, 4.zst)
nix run nixpkgs#zstd -- nixpkgs.tar -${i} -o ${i}.zst  0.51s user 0.05s system 108% cpu 0.514 total
nixpkgs.tar          : 19.01%   (111411200 => 21174846 bytes, 5.zst)
nix run nixpkgs#zstd -- nixpkgs.tar -${i} -o ${i}.zst  0.81s user 0.04s system 104% cpu 0.817 total
nixpkgs.tar          : 18.53%   (111411200 => 20642174 bytes, 6.zst)
nix run nixpkgs#zstd -- nixpkgs.tar -${i} -o ${i}.zst  1.00s user 0.04s system 103% cpu 0.999 total
nixpkgs.tar          : 17.56%   (111411200 => 19560791 bytes, 9.zst)
nix run nixpkgs#zstd -- nixpkgs.tar -${i} -o ${i}.zst  1.92s user 0.04s system 102% cpu 1.916 total
nixpkgs.tar          : 16.99%   (111411200 => 18927901 bytes, 12.zst)
nix run nixpkgs#zstd -- nixpkgs.tar -${i} -o ${i}.zst  3.77s user 0.06s system 101% cpu 3.780 total
nixpkgs.tar          : 16.37%   (111411200 => 18241631 bytes, 15.zst)
nix run nixpkgs#zstd -- nixpkgs.tar -${i} -o ${i}.zst  10.58s user 0.07s system 100% cpu 10.608 total
nixpkgs.tar          : 15.55%   (111411200 => 17325552 bytes, 18.zst)
nix run nixpkgs#zstd -- nixpkgs.tar -${i} -o ${i}.zst  24.79s user 0.10s system 100% cpu 24.834 total
Warning : compression level higher than max, reduced to 19
nixpkgs.tar          : 15.27%   (111411200 => 17012327 bytes, 22.zst)
nix run nixpkgs#zstd -- nixpkgs.tar -${i} -o ${i}.zst  35.15s user 0.10s system 100% cpu 35.193 total

where the output file denotes the compression level (i.e. 6.zst was compressed with level 6). Looking at the data, it seems like the default of 3 is probably the best? Or at least anything less than 12?

bennofs commented 3 years ago

The reason I initially set tit to the highest level is that I expect the index to be created only once, but used many times. Therefore, I assumed the indexing time doesn't really matter that much, whereas disk space is a cost you pay forever. Do you have a use case where a few minutes of extra indexing time has a big impact? Note also that compressing is done at the same time as fetching, and I believe indexing is mostly network-IO bound unless you have a very fast network.

Note that we don't compress the tarball of nixpkgs, but a custom data structure. Using the actual nix-index database, we can run the experiment like this:

$ cd ~/.cache/nix-index
$ tail -c+13 files | zstdcat > files.raw
$ for i in (seq 19); time zstd -$i -o $i.zst files.raw; end

Which gives the following data:

files.raw            : 28.82%   (88865494 => 25613130 bytes, 1.zst)

________________________________________________________
Executed in  403.78 millis    fish           external
   usr time  414.35 millis  893.00 micros  413.46 millis
   sys time   36.67 millis   61.00 micros   36.61 millis

files.raw            : 27.08%   (88865494 => 24063919 bytes, 2.zst)

________________________________________________________
Executed in  526.18 millis    fish           external
   usr time  546.19 millis    0.00 millis  546.19 millis
   sys time   34.99 millis    1.06 millis   33.93 millis

files.raw            : 26.07%   (88865494 => 23167408 bytes, 3.zst)

________________________________________________________
Executed in  710.79 millis    fish           external
   usr time  707.52 millis    0.00 millis  707.52 millis
   sys time   65.11 millis    1.18 millis   63.93 millis

files.raw            : 25.80%   (88865494 => 22924652 bytes, 4.zst)

________________________________________________________
Executed in    1.01 secs    fish           external
   usr time    1.01 secs    0.00 micros    1.01 secs
   sys time    0.06 secs  853.00 micros    0.06 secs

files.raw            : 24.96%   (88865494 => 22183517 bytes, 5.zst)

________________________________________________________
Executed in    1.73 secs    fish           external
   usr time    1.73 secs    0.00 micros    1.73 secs
   sys time    0.05 secs  865.00 micros    0.05 secs

files.raw            : 24.66%   (88865494 => 21916181 bytes, 6.zst)

________________________________________________________
Executed in    2.81 secs    fish           external
   usr time    2.78 secs    0.00 micros    2.78 secs
   sys time    0.07 secs  862.00 micros    0.07 secs

files.raw            : 23.99%   (88865494 => 21323139 bytes, 7.zst)

________________________________________________________
Executed in    3.74 secs    fish           external
   usr time    3.75 secs    0.00 millis    3.75 secs
   sys time    0.05 secs    1.03 millis    0.05 secs

files.raw            : 23.64%   (88865494 => 21011208 bytes, 8.zst)

________________________________________________________
Executed in    4.74 secs    fish           external
   usr time    4.73 secs    0.00 millis    4.73 secs
   sys time    0.06 secs    1.06 millis    0.06 secs

files.raw            : 23.49%   (88865494 => 20875617 bytes, 9.zst)

________________________________________________________
Executed in    6.70 secs    fish           external
   usr time    6.68 secs    0.00 millis    6.68 secs
   sys time    0.07 secs    1.04 millis    0.07 secs

files.raw            : 23.14%   (88865494 => 20563492 bytes, 10.zst)

________________________________________________________
Executed in    9.68 secs    fish           external
   usr time    9.61 secs    0.00 millis    9.61 secs
   sys time    0.12 secs    1.08 millis    0.12 secs

files.raw            : 23.05%   (88865494 => 20482471 bytes, 11.zst)

________________________________________________________
Executed in   11.77 secs    fish           external
   usr time   11.71 secs    0.00 millis   11.71 secs
   sys time    0.11 secs    1.15 millis    0.11 secs

files.raw            : 22.93%   (88865494 => 20375330 bytes, 12.zst)

________________________________________________________
Executed in   18.25 secs    fish           external
   usr time   18.17 secs    0.00 millis   18.17 secs
   sys time    0.12 secs    1.62 millis    0.12 secs

files.raw            : 22.79%   (88865494 => 20251690 bytes, 13.zst)

________________________________________________________
Executed in   14.76 secs    fish           external
   usr time   14.71 secs    0.00 millis   14.71 secs
   sys time    0.09 secs    1.15 millis    0.09 secs

files.raw            : 22.65%   (88865494 => 20125406 bytes, 14.zst)

________________________________________________________
Executed in   18.62 secs    fish           external
   usr time   18.56 secs    0.00 micros   18.56 secs
   sys time    0.09 secs  852.00 micros    0.09 secs

files.raw            : 22.54%   (88865494 => 20030096 bytes, 15.zst)

________________________________________________________
Executed in   24.12 secs    fish           external
   usr time   24.04 secs    0.00 micros   24.04 secs
   sys time    0.09 secs  952.00 micros    0.09 secs

files.raw            : 21.36%   (88865494 => 18982930 bytes, 16.zst)

________________________________________________________
Executed in   28.88 secs    fish           external
   usr time   28.81 secs    0.00 micros   28.81 secs
   sys time    0.08 secs  838.00 micros    0.08 secs

files.raw            : 20.67%   (88865494 => 18371249 bytes, 17.zst)

________________________________________________________
Executed in   38.83 secs    fish           external
   usr time   38.71 secs    0.00 millis   38.71 secs
   sys time    0.10 secs    1.24 millis    0.10 secs

files.raw            : 20.51%   (88865494 => 18230390 bytes, 18.zst)

________________________________________________________
Executed in   46.56 secs    fish           external
   usr time   46.46 secs    0.00 micros   46.46 secs
   sys time    0.07 secs  894.00 micros    0.07 secs

files.raw            : 20.30%   (88865494 => 18035524 bytes, 19.zst)

________________________________________________________
Executed in   60.52 secs    fish           external
   usr time   60.38 secs    0.00 micros   60.38 secs
   sys time    0.10 secs  856.00 micros    0.10 secs
colinxs commented 3 years ago

I see. In that case, does decompression speed matter? I'd imagine the index would have to be decompressed on every call to command-not-found, but even with -22 it seems fast enough so I'm guessing it doesn't matter much.

I think the compression speed starts to matter more for someone like me with local packages provided via overlay, in which case I may update the index often, but I haven't gotten overlays to work with nix-index so it's a moot point. As a side question, is it possible for overlays to work with nix-index?

In any case it seems like -22 is fine, although something like -16 seems like a decent tradeoff (2x the speed for 1% lower compression ratio), but the call is yours!

Thanks for the quick response.

bennofs commented 3 years ago

Nix-index depends on the file listings provided by hydra. So if you have custom overlays, there's no way for nix-index to know which files would be in the output of a derivation if it wasn't built by hydra. Perhaps we could add packages that are present in the local nix store to the index.