rust-lang / rustup

The Rust toolchain installer
https://rust-lang.github.io/rustup/
Apache License 2.0
6.2k stars 891 forks source link

delta compression for packages #653

Open llogiq opened 8 years ago

llogiq commented 8 years ago

Especially nightlies could benefit, both server&client-side.

spease commented 7 years ago

Stuck on a poor internet connection for the holiday, and I agree. xdelta3 is the best bindiff option that I know of.

lolbinarycat commented 2 months ago

perhaps RFC 3229 could be used for this.

lolbinarycat commented 2 months ago

did some testing with xdelta.

nightly-2024-06-10-x86_64-unknown-linux-gnu/bin/rustc is 2.6M
the 2024-09-19 nightly is also 2.6M

the delta between them is 525K (down to 487K when running with -9). that's about a 5x size reduction, pretty good!

keep in mind these are unideal conditions assuming no update in 3 months.

lolbinarycat commented 2 months ago

it's worth noting that xdelta3 is just a piece of software, vcdiff is the underlying file format.

unfortunatly RFC 3229 doesn't have much in terms of software support, but it's simple enough it could probably be implemented mostly with middleware shims.

djc commented 2 months ago

I think this is an interesting idea and think its time may have come. I think the implementation in rustup could be fairly straightforward, but I think the majority of work here will be on the backend. As such, recommend starting with an issue against https://github.com/rust-lang/infra-team (and please mention the new issue here if you do so that we can coordinate).

Kobzol commented 1 month ago

FWIW, I tried a little experiment. I computed a binary diff delta using xdelta, on the librustc_driver.so file, between Rust 1.80.0 and 1.81.0. The original file has ~125 MiB, 32 MiB with XZ compression. The delta file has ~91 MiB, 33 MiB with XZ compression. So in this case it does not seem to be worth it.

Between two consecutive nightlies (2024-10-11 and 2024-10-10), the delta was 66 MiB uncompressed and 26 MiB after XZ compression. That's better, but it still does not seem worth the complexity, tbh, at least for this specific compiler artifact.

lolbinarycat commented 1 month ago

what if you did a diff of two uncompressed tarballs of the entire component? since currently we download whole components at a time.

Kobzol commented 1 month ago

I only had a small script for a single file, I don't have much time to experiment with this right now, especially since the results so far have been underwhelming. But if you can do that experiment, I would be interested in the result.