rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.58k stars 12.48k forks source link

Avoid shipping duplicate artifacts in the host and target sysroot #42645

Open alexcrichton opened 7 years ago

alexcrichton commented 7 years ago

All released compilers have identical dynamic libraries in two locations. The locations on Linux are:

All of these artifacts are byte-for-byte equivalent (they're just copies of one another). These duplicate artifacts inflate our installed size, inflate downloads, and cause weird bugs like https://github.com/rust-lang/rust/issues/39870. Although https://github.com/rust-lang/rust/issues/39870 is itself fixed it's just a hack fix for now that would be ideally solved by fixing this issue!

Some possible thoughts I personally have on this are:

Unsure if I'm on the right track there, but hopefully can get discussion around this moving!

durka commented 7 years ago

Can you clarify:

$sysroot/lib/rustlib/$target/lib is where the compiler looks for target libraries. The compiler can't look in $sysroot/lib for libs as that's typically got a ton of libs on Unix systems.

What kinds of Bad Stuff (tm) would happen if there were "a ton of libs" in the place the compiler looks for target libraries?

alexcrichton commented 7 years ago

Oh sure yeah, I'm basically thinking of https://github.com/rust-lang/rust/issues/20342, which is the direct consequence of looking in all of $sysroot/lib for libs.

est31 commented 7 years ago

inflate downloads

I'm not sure about that. Afaik we order files by name inside download folders to avoid precisely that.

retep998 commented 7 years ago

@est31 The $sysroot/lib/*.dylib libraries are in a different component than the $sysroot/lib/rustlib/$target/lib/*.dylib libraries. Because they're in different components, compression can't eliminate the redundancy.

brson commented 7 years ago

The most plausible solution in my mind is to create our own pseudo-symlink file format. When assembling a sysroot this is what rustbuild itself would emit (instead of copying files) but it'd basically be a file with the literal contents rustc-look-in-your-libdir

This seems unlikely to work with the dynamic linker in cases where rpath is disabled or unavailable. I believe the main reason for the historical redundancy here is so that the dylibs rustc needs are literally located in /usr/local/lib.

brson commented 7 years ago

I favor a hardlink solution, but teaching rust-installer/rustup how to do that across components is pretty hairy.

A (relatively) simple solution would be to leave the components as they are, but have rustup deduplicate them with hardlinks at install time. If that were combined with a proposed optimization to have rustup use the combined package when possible, the effect would be that downloads were deduplicated (via compression), and disk space was deduplicated (via hardlinks).

brson commented 7 years ago

The most plausible solution in my mind is to create our own pseudo-symlink file format. When assembling a sysroot this is what rustbuild itself would emit (instead of copying files) but it'd basically be a file with the literal contents rustc-look-in-your-libdir

Doing this in the opposite direction seems like it would work (pseudo-symlinks in libdir), but then you lose the consistency of having all the real libs in libdir, and seemingly the installer would have to be responsible for setting that up.

alexcrichton commented 7 years ago

Oh sorry yeah I was thinking that $sysroot/lib/rustlib/$target/lib/*.dylib would be a "pseudo symlink" to the versions in $sysroot/lib, that way we wouldn't mess with the libraries that rustc itself needs to execute.

Upon further reflection though I do agree that this seems like a rustup problem sort of. We still want to produce a rust-std package with all of the libraries in it, not a bunch of "pseudo symlink" pointers which point to nonexistent libraries. We basically want rustup toolchains and make install installed-toolchains to have this "symlink behavior" but everything else should stay as-is today.

cuviper commented 7 years ago

FWIW, in Fedora packaging I do replace the rustlib libraries with actual symlinks to the libdir. I suppose it wouldn't hurt if those were "pseudo" symlinks, but I want to be careful about that redirection. Namely, I've got /usr/lib/rustlib/$target/lib/ so all targets share a common /usr/lib/rustlib/, and then 64-bit rustc will get its libraries from /usr/lib64 because that's how Fedora arranges things.

(I've kind of hacked that in place after ./x.py install since rustbuild et al. don't allow separating the libdir and rustlibdir paths, but maybe they should.)

MikeMcQuaid commented 7 years ago

We basically want rustup toolchains and make install installed-toolchains to have this "symlink behavior" but everything else should stay as-is today.

A suggestion: use symlinks in make install on Unix systems that support them and punt on a Windows solution for now. It seems the complaints about double-packaging and related issues are currently exclusively from Unix packagers so I think you could get away with just addressing it there for now.

retep998 commented 7 years ago

Windows users would still like to avoid having to download those libraries twice. It's not really critical or anything, just something that would be helpful in the future when someone gets around to it maybe.

MikeMcQuaid commented 7 years ago

Windows users would still like to avoid having to download those libraries twice.

Yep but they have to do that today. I'm not sure it's worth avoiding a straightforward solution to the problem for Unix systems because there's no obvious solution for Windows.

metux commented 7 years ago

Oh sorry yeah I was thinking that $sysroot/lib/rustlib/$target/lib/*.dylib would be a "pseudo symlink" to the versions in $sysroot/lib, that way we wouldn't mess with the libraries that rustc itself needs to execute.

Wait a second - (host) rustc itself links against libraries in (target) sysroot ? Seriously ?!

jonwolski commented 7 years ago

Are symlinks or hard links not an option just due to Windows support? If so, perhaps it is worth pointing out that Windows 10 supports symbolic and hard links without the privilege escalation that was necessary in Vista, 7, and 8 (and XP if you include "junctions").

Forgive me if I'm stating the obvious. I do not really understand this issue. I just would not want to see an easy solution overlooked due to unfamiliarity with Windows' current capabilities.

https://blogs.windows.com/buildingapps/2016/12/02/symlinks-windows-10/#RRmytWmTlOwHQ8YZ.97

steveklabnik commented 5 years ago

Triage: I don't think that anything has changed here, but I'm not sure.

eddyb commented 4 years ago

@Mark-Simulacrum did either of us open an issue about the libLLVM-*.so duplication?

Mark-Simulacrum commented 4 years ago

Not to my knowledge, no. It would probably be good to do so.

eddyb commented 4 years ago

Filed #70838.

pwnorbitals commented 3 years ago

This may be fixed now that #70838 is closed

jyn514 commented 2 years ago

Triage: the only duplicate artifacts I see now are libstd.so and libtest.so, which sounds like it's a lot less than before (12 MB between them). But those two are still duplicated.

(posting for future reference: it turns out libtest.so is shipped in the host sysroot so that rustdoc can compile doctests)