rust-lang / rustup

The Rust toolchain installer
https://rust-lang.github.io/rustup/
Apache License 2.0
6.18k stars 891 forks source link

Prefer faster compression methods under good network #1858

Open ishitatsuyuki opened 5 years ago

ishitatsuyuki commented 5 years ago

Describe the problem you are trying to solve

Decompression can be an bottleneck if the network is fast. XZ (LZMA) is surprising slow and can quickly dominate the time spent on installation.

For network connections faster than broadband, we should consider not using LZMA compression. This includes home fiber and datacenters, notably CI environments.

Describe the solution you'd like

Notes

Ubuntu proposal case study.

kinnison commented 5 years ago

For the most part, the compression scheme selected is up to the tool which generates the compiler releases. As far as I know, however, the manifest format only allows for a single download file per component, so we wouldn't be able to choose compression methods for the user. I imagine that we use xz for compression ratio reasons.

rbtcollins commented 5 years ago

@ishitatsuyuki do you have some profiling data showing that decompression is a bottleneck here? One thing we could very easily do if it is a bottleneck is move decompression to a dedicated thread.

ishitatsuyuki commented 5 years ago

Reference file: 183M rust-std-1.34.2-x86_64-pc-windows-gnu.tar
CPU: Intel(R) Core(TM) i7-6500U CPU

Compression Compressed size Download time Decompression time
gz 71M 4.4s 1.25s
xz 56M 3.4s 3.50s
zstd (-19) 59M N/A 0.34s

You can clearly see that when the internet is fast, decompression time is a significant part of installation time (on Linux where I/O doesn't get interfered by antimalware). Moving it to a dedicated thread doesn't make anything faster since the CPU is the bottleneck.

I run a NVMe drive so files can be written almost instantly, but for rotating disks it may take much more time.

My Wi-Fi which is used for download test had an average throughput of 16.2MB/s. Using a wired connection is even faster, yielding around 80MB/s. This resembles typical datacenter networking speed.

As a bonus, zstd decompression is faster than any other methods listed here, while retaining a similar compression ratio to xz. Time to adopt.

rbtcollins commented 5 years ago

Thank you for the profiling data.

So, lets break this into several aspects.

I have absolutely no attachment to the current compressor; but rustup is merely a consumer of the archives produced elsewhere in the ecosystem. So the adoption process is going to be:

  1. get the rest of the ecosystem producing multiple formats
  2. get rustup to switch (whether dynamically or just permanently)

Do you know if xz or zstd(-19) format files are already available? If not, I'm not sure where the relevant place to file a bug is. (@kinnison )

re: would moving decompression to a thread help: it may - because it allows concurrency; obviously the decompressor itself, if single threaded, will not become faster, so that would be a lower bound on the install time, but rather than T=(decompression + other handling), the install time might be as low as T=(decompression). Even on single core, 2 hardware threaded machine, there is usually room for improvement in this case.

We should obviously do the format switching work, but I'll see about a cheap message passing test with 4MB buffers in a thread when I get a chance.

rbtcollins commented 5 years ago

In terms of assessing formats, we have binaries, libaries, and docs, I don't know that we'll get consistent results across all types; it may be worth testing across them all.

ishitatsuyuki commented 5 years ago

Do you know if xz or zstd(-19) format files are already available?

The distribution manifest has two urls: no suffix (gz) and xz.

kinnison commented 5 years ago

In order to proceed on this, the infra team want some comparisons.

Could you please gather:

  1. For all artifacts of a nightly, what is the total size of the .xz files vs. the .zstd file equivalents
  2. How long does it take to decompress all the .xz and all the .zstd equivalents in total. (incude your approximate CPU details here)
  3. How fast does a network have to therefore be, to make the increase in download size be mitigated by the improvement in decompression speed.
  4. Is our current use of xz as optimal as it could be? Can we turn on features in the crate to mitigate the decompression cost?

The feeling from the infra team is that adding another compression tarball is non-trivial because it'll add around half a gigabyte of artifacts per day to our s3 bucket so if we were to intend to add zstd, we'd have to drop at least one of gz or xz and we're not entirely sure how much relies on referring directly to those artifacts (i.e. not installing via rustup)

If you can gather that data, I can take the results to the infra team for further discussion.

ishitatsuyuki commented 5 years ago

Tested release: nightly-x86_64-unknown-linux-gnu 2019-05-24
CPU: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz * 4 (virtualized)
Note: only single-threaded implementations are tested. pixz did not seem to yield speed increase, pzstd performed well but it has its own logic, zstdmt did not support parallel decompression.

Size and decompression times:

Component xz xz time zstd (-19) zstd time
cargo 4.6M 0.48 5.1M 0.073
llvm-tools 556K 0.068 612K 0.018
miri 896K 0.103 1000K 0.023
rust-analysis 552K 0.073 580K 0.025
rust-docs 12M 1.54 12M 0.294
rust-std 62M 4.25 66M 0.495
rustc 91M 8.30 99M 1.13
rustfmt 2.7M 0.301 3.0M 0.058
sum 172M 15.12 185M 2.13
rust (all-in-one) 154M 13.7 166M 2.07

Difference between xz and zstd is roughtly 13MB and 13s, which means that under networks faster than 10Mb/s, zstd will perform better.

xz, as mentioned above, doesn't seem worth parallelizing. We're already at the "best compression without ridiculously slow speed" (-6), which is the default, and I don't think there's any thing to change here.

kinnison commented 5 years ago

Thank you for these numbers. This implies that we can expect to see a roughly 7 to 8 percent increase in the size of a release moving from xz to zstd. Based on numbers I was given last night, a release (nightly, beta, stable) is about 25 gigabytes, and purely adding zstd would thusly likely add around 10G to that.

I shall now take this back to the infra team for further consideration. Thank you for your efforts so far.

kinnison commented 3 years ago

I believe further discussion was had in #2488