Open j-baker opened 1 year ago
AFAICT all the symlinking stuff is not worth pursuing, and taring the target dir (use-zstd
) is best, including being faster, especially on non-trivial projects. I haven't redone the comparison after symlink changes, but I'm sceptical that they can overcome these problems.
My intuition tells me, that ff it was possible to implement target deduplication via extracing layered archives things would really work well.
I've been responsible for CI for a large Rust projest, that builds > 10GB of ./target
and taring and untaring ./target
is not even noticeable between all the actual heavy lifting (mostly cargo
compiling).
Hi! It might be that we’re compiling different kinds of Rust projects. In the ones that I work with, there is perhaps a relatively large dependency tree (think 400 crates) but relatively small amount of project code (say, 10-20k lines). Think, small crud app which has a grpc api, accesses a database, does some serde and uses some async.
What this means is that compiling the dependencies takes quite a bit of time if you’re doing it from scratch, while the incremental cost can be pretty good. Put simply, without Crane the typical dev recompile flow is measured in the single digit seconds.
My users are comparing Crane usage to raw Cargo, so the expectation on ‘I have made a small change’ is a relatively cheap build. Obviously, the normal dev cycle is for people to run ‘cargo test’ or similar, but my overall aim is that people should want to run ‘nix flake check’ as their last step before pushing, and that it should typically run quickly.
The cost of zstd-encoding and copying the output is considerable even on small projects, especially compared to the incremental cost of compilation in the typical cargo workflow.
It also leads to considerable storage bloat - consider the crane example which runs nextest, doc, build, and clippy simultaneously. With the symlinking strategy, if there are 2GB of compiled artifacts, the build can be made to use about 2GB of storage. With the untarring strategy, it will use 8GB.
This is all to say - as a comparison to recompiling, zstd and tarring is obviously better - but as a comparison to iterative cargo compilation it is perhaps weaker.
Your comment does however give me an idea - I’d wonder if a tar of symlinks might represent the best of both worlds in terms of performance? It would likely be much faster to extract than traversing the filesystem due to sequential disk access and reduced cost of forking.
My users are comparing Crane usage to raw Cargo, so the expectation on ‘I have made a small change’ is a relatively cheap build.
Your users should keep using raw cargo in a dev shell during dev work anyway. They don't have to compare with raw cargo, because they will be using raw cargo.
Having devs run nix
commands during dev work is one of the biggest mistakes one can make when using Nix flakes.
my overall aim is that people should want to run ‘nix flake check’ as their last step before pushing, and that it should typically run quickly.
That's just a waste of time and disk space. My users do just lint
or just final-check
none of which invokes nix
, but only uses the inputs provided by the dev shell to do what needs to be done. Since barely anyone ever touches Nix code, if things work in a dev shell, 99.9% they will work in Nix.
With the untarring strategy, it will use 8GB.
With taring (and most importantly zstd) 12GB of artifacts compresses to like 2GB target.zstd.tar
. The build directory itself is being discarded after Nix derivation is compiled so it largely doesn't matter how much space it uses when Rust is building. Compression & extraction are fast (I can barely notice them, and our project is large, and we do all sorts of things via Nix CI). What matters most is much the end artifacts take in the /nix/store, especially since download/uploading stuff to cachix takes time.
In addition if the incremental layers were implemented taring would become ever better. I'm confident that layers of incremental compressed archives is all around the best strategy.
In theory symlinking could be more performant, but in practice it's an uphill battle, where cargo will break it in weird ways, because we'll be messing with its internals that are not guaranteed to be stable, and no one tests or even cares about this use case.
I think there are some assumptions you made there which I don’t think hold in the cases of many. Regarding the extraction time - I’ve observed that CI nodes are typically rather pressed for disk bandwidth. For example, a typical Azure host will have 100-200MB/sec of disk write throughout, meaning that a 12GB extraction will take perhaps 1 minute - not insignificant, especially problematic if it happens 4 times - 4 minutes of waiting is something worth avoiding. This is obviously system dependent, but that order of magnitude remains common. This is to say that just because it is fast for you does not mean it is fast for all. To the tempdir thing - if you are running 4 tasks in parallel, that’s how you get to the 12GB live disk space usage result. This also negatively affects the OS’ ability to cache well, it will tend to thrash the page cache. For one other thing - nix store optimisation will tend to ensure that rebuilds, though time consuming, don’t take up more space in the store, whereas tarring up results will. This means that after running build 5 times with no GC, you will be using more disk space. For those of us who run S3 backed caches, the garbage collection problem is more acute.
I have another proposal for a solution which would likely be practical, btw
On Fri, 15 Sep 2023 at 21:42, Dawid Ciężarkiewicz @.***> wrote:
My users are comparing Crane usage to raw Cargo, so the expectation on ‘I have made a small change’ is a relatively cheap build.
Your users should keep using raw cargo in a dev shell during dev work anyway. They don't have to compare with raw cargo, because they will be using raw cargo.
Having devs run nix commands during dev work is one of the biggest mistakes one can make when using Nix flakes.
my overall aim is that people should want to run ‘nix flake check’ as their last step before pushing, and that it should typically run quickly.
That's just a waste of time. My users do just lint or just final-check none of which invokes nix, but only uses the inputs provided by the dev shell to do what needs to be done. Since barely anyone ever touches Nix code, if things work in a dev shell, 99.9% they will work in Nix.
With the untarring strategy, it will use 8GB.
With taring (and most importantly zstd) 12GB of artifacts compresses to like 2GB target.zstd.tar. The build directory itself is being discarded after Nix derivation is compiled so it largely doesn't matter how much space it uses when Rust is building. Compression & extraction are fast (I can barely notice them). What matters most is much the end artifacts take in the /nix/store.
In addition if the incremental layers were implemented taring would become ever better. I'm confident that layers of incremental compressed archives is all around the best strategy.
In theory symlinking could be even better, but in practice it's an uphill battle, where cargo will break it in weird ways, because we'll be messing with its internals that are not guaranteed to be stable, and no one tests or even cares about this use case.
— Reply to this email directly, view it on GitHub https://github.com/ipetkov/crane/issues/385#issuecomment-1721844181, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAKMJNWTHKOH4FJHFVHQNLX2S4R5ANCNFSM6AAAAAA4ZLZWOM . You are receiving this because you authored the thread.Message ID: @.***>
For example, a typical Azure host will have 100-200MB/sec of disk write throughout, meaning that a 12GB extraction will take perhaps 1 minute - not insignificant, especially problematic if
I wonder if Nix is using some kind of a ramdisk for the build (maybe it's configurable?). So the disk IO is not the 12GB of uncompressed data, but 2GB of compressed data. Things are just flying too fast, and your ballpark calculations are reasonable. This has its own problems (running out of memory).
Just did the quick check:
One screenshot is just dependencies (smaller), the other one is whole workspace extracted for the test derivation. Running on Jetbuild for Github Actions.
For one other thing - nix store optimisation will tend to ensure that rebuilds, though time consuming, don’t take up more space in the store, whereas tarring up results will.
Taring the difference will ensure near 0 redundancy, and I'm 100% certain it can be done in a very performant way. This is currently unimplemented, and even with the redundancy, it's been working great.
I don't deny that symlink approach would be faster, especially in theory. But as someone that have spent days debugging all sorts of issues in our Nix CI pipeline for various reasons (most my mistakes, but some were crane bugs, some just general complexity), and at least 8 hours recently on mysterious rebuilds in newer versions of Rust toolchain, I value the brutal simplicity and robustness of taring over marginal performance improvements. Not to mention that even right now the symlinking is partial, IIRC, because reasons.
I would be happy to have a perfectly working "tar the symlinks" model as it seems perfect. I would use it, but I guarantee it will break sometimes in mysterious ways, because cargo
does not expect to have files messed with, and then crate build scripts are often doing crazy things. So even then I'd rather have "incremental tars + copy" model that I advocate to fall back on, because I'm reasonably sure it is good enough and will always work because it is invisible to cargo.
I believe what you're observing is GNU tar's behaviour where it doesn't fsync when extracting. Effectively, this means that you have a ramdisk so long as you have free memory - but when you run out, it'll slow down to the speed of your disk. Linking back to the point in my previous comment, the workflow you are describing will work great so long as resources are good (lots of ram, fast disk) but drop either or both of those (due to e.g. being on some colo box) and the effects are outsized.
I've had another go at implementing symlinking based on my comments from last night, and it appears to work well - much faster than the present strategy and robust to crate rebuilds.
PR is at #386! would appreciate if you could give it a shot to see what performance is like for you! The sense I get is that it uses rustc in a way that's robust to changes - it tells rustc to emit data into another directory and overwrites symlinks, rather than relying on rustc doing the right thing.
@j-baker I commented on your PR. It's probably an improvement, even if not in my favorite approach. :D
BTW. It occurred to me that one reason why zstd
compresses so well is because there is tons of redundancy between files in ./target
. These are the storage savings that symlinking will not get. It might pay for itself just in this way, at least as long as cargo itself does not compress better.
Which reminded me that in your approach you can only symlink cargo artifacts, while incremental deduplication could dedup anything. In our project we (very unfortunately) have quite a few C dependencies and I wander how much do cargo check
and cargo doc
produce (vs rustc).
I believe what you're observing is GNU tar's behaviour where it doesn't fsync when extracting. Effectively, this means that you have a ramdisk so long as you have free memory - but when you run out, it'll slow down to the speed of your disk. Linking back to the point in my previous comment, the workflow you are describing will work great so long as resources are good (lots of ram, fast disk) but drop either or both of those (due to e.g. being on some colo box) and the effects are outsized.
I think you're right here, but also I don't think this is going to be an issue. The system you're building needs resources corresponding to the project size, and most of these files will be read over and over during the build, so the fact that they are warm in the memory is a plus. If you want a fast CI, you want your ./target
to fit in RAM one way or another.
In
the fact that they are warm in the memory is a plus. If you want a fast CI, you want your
./target
to fit in RAM one way or another.
In a way extracing zstd might have saved time over symlinking again, by just reading everything into in-memory disk cache with one big sequential read, while doing only 20% of the IO.
@j-baker @ipetkov
Since I tried the improved symlinking in the current revision I'd like to post some metrics I can already capture:
Sun, 17 Sep 2023 23:28:07 GMT fedimint-deps> installing target to /nix/store/mxgjp66jw2iw8ang8dhi77ri6i4rx46m-fedimint-deps-0.0.1
Sun, 17 Sep 2023 23:28:13 GMT fedimint-deps> /build/tmp.7gTbCmcFrk /build/source
So copying just the dependencies takes 6s (vs 4s from the zstd version)
But most importantly you see the wall of text in a post fixup phase:
Sun, 17 Sep 2023 23:28:13 GMT fedimint-deps> post-installation fixup
Sun, 17 Sep 2023 23:28:13 GMT fedimint-deps> shrinking RPATHs of ELF executables and libraries in /nix/store/mxgjp66jw2iw8ang8dhi77ri6i4rx46m-fedimint-deps-0.0.1
Sun, 17 Sep 2023 23:28:13 GMT fedimint-deps> shrinking /nix/store/mxgjp66jw2iw8ang8dhi77ri6i4rx46m-fedimint-deps-0.0.1/target/ci/fedimintd
Sun, 17 Sep 2023 23:28:13 GMT fedimint-deps> shrinking /nix/store/mxgjp66jw2iw8ang8dhi77ri6i4rx46m-fedimint-deps-0.0.1/target/ci/devimint
...
Sun, 17 Sep 2023 23:28:13 GMT fedimint-deps> shrinking /nix/store/mxgjp66jw2iw8ang8dhi77ri6i4rx46m-fedimint-deps-0.0.1/target/ci/build/zstd-sys-946f5b742f816cdf/out/zstd/lib/legacy/zstd_v03.o
Sun, 17 Sep 2023 23:28:13 GMT fedimint-deps> patchelf: wrong ELF type
Sun, 17 Sep 2023 23:28:13 GMT fedimint-deps> shrinking /nix/store/mxgjp66jw2iw8ang8dhi77ri6i4rx46m-fedimint-deps-0.0.1/target/ci/build/zstd-sys-946f5b742f816cdf/out/zstd/lib/legacy/zstd_v05.o
...
Sun, 17 Sep 2023 23:28:14 GMT fedimint-deps> patchelf: wrong ELF type
Sun, 17 Sep 2023 23:28:14 GMT fedimint-deps> shrinking /nix/store/mxgjp66jw2iw8ang8dhi77ri6i4rx46m-fedimint-deps-0.0.1/target/ci/build/prost-build-56ba7b5010ed83ff/build_script_build-56ba7b5010ed83ff
Sun, 17 Sep 2023 23:28:14 GMT fedimint-deps> shrinking /nix/store/mxgjp66jw2iw8ang8dhi77ri6i4rx46m-fedimint-deps-0.0.1/target/ci/build/prost-build-56ba7b5010ed83ff/build-script-build
Sun, 17 Sep 2023 23:28:14 GMT fedimint-deps> shrinking /nix/store/mxgjp66jw2iw8ang8dhi77ri6i4rx46m-fedimint-deps-0.0.1/target/ci/build/tiny-keccak-df2eda386b4069b4/build_script_build-df2eda386b4069b4
...
Sun, 17 Sep 2023 23:28:22 GMT fedimint-deps> checking for references to /build/ in /nix/store/mxgjp66jw2iw8ang8dhi77ri6i4rx46m-fedimint-deps-0.0.1...
Sun, 17 Sep 2023 23:28:22 GMT fedimint-deps> patchelf: wrong ELF type
Sun, 17 Sep 2023 23:28:22 GMT fedimint-deps> patchelf: wrong ELF type
...
Sun, 17 Sep 2023 23:28:23 GMT fedimint-deps> patchelf: wrong ELF type
Sun, 17 Sep 2023 23:28:31 GMT fedimint-deps> patching script interpreter paths in /nix/store/mxgjp66jw2iw8ang8dhi77ri6i4rx46m-fedimint-deps-0.0.1
That's an extra 10s
(or 18s
, I'm not sure if this stuff before last patching script interpreter
should be counted) that did not exist at all in zstd approach. I'm reasonably sure it can be avoided, and it should be. I reported it a while ago, but didn't dwell on it, because I'm not using use-symlink
myself.
Further more the disk space use is much, much higher:
> du -cksh /nix/store/8ddgq21p04f73hjibnz43ynf1q2ljnkv-fedimint-deps-0.0.1/target
3.8G /nix/store/8ddgq21p04f73hjibnz43ynf1q2ljnkv-fedimint-deps-0.0.1/target
3.8G total
3.8G vs 818M with zstd. And that's only the dependencies!
Now when the workspace inheriting deps-only artifacts:
Sun, 17 Sep 2023 23:28:36 GMT fedimint-workspace-build> copying cargo artifacts from /nix/store/mxgjp66jw2iw8ang8dhi77ri6i4rx46m-fedimint-deps-0.0.1/target to target
Sun, 17 Sep 2023 23:28:39 GMT fedimint-workspace-build> configuring
3s vs 4s with zstd.
So far no real speedup on inheriting, and slower savings, larger disk usage. (and we ignore the overhead of rustc wrapper).
Then the current version fails, but I don't think these metrics will change, so there really need to be some sweet gains in the second stage to make up for this losses in the first one.
Hi! Recently crane changed to symlinking rlib and rmeta files when reading dependencies. For the random reader, Crane generates the dependencies in one nix build, and then uses them in a later one. Effectively, you generate your cargo
/target
dir in one build step, and then you 'put it into place' in the next build step.This has many practical advantages.
Obviously, this is awesome - way less time spent copying voluminous object files. Unfortunately, it leads to a common failure mode - if a dependency is out of date somehow and cargo decides it needs to rebuild it, this rebuild will fail - it tries to write to the target of the symlink and this fails, as it's a nix store path.
I've seen this caused for 2 reasons:
CMAKE_PREFIX_PATH
not having changed. Somehow, it is changing between the root build and the 'clippy' build. The only difference I can tell is that clippy is added to the nativeBuildInputs. I'm wondering if clippy itself causes the cmake prefix path to change. Either way, build explodes.This can be worked around by setting
doNotLinkInheritedArtifacts = true
in most cases, but not all. In particular, I struggled to get a build which usedcxx
to build, because cxx created absolute symlinks which were dangling post copy into the Nix store, meaning thatcp
itself then fails. This is now fixed on master of that project.The takeaway for me is that this change makes Crane significantly more brittle when building Rust crates, and the error which is emitted can be rather hard to debug.
A sample error would be:
which if you know how crane works, makes sense - but it's not very introspectable for abstract random dev joining the project, because the symlinking behaviour isn't that obvious. I'm trying to make it so that new engineers don't need to know how crane works in order to build their code - rather, they just need to know that it does.
I've got a few proposals for ways out:
The opencv issue
Here, crane is leading to a diff in CMAKE_PREFIX_PATH, between building deps and other tests. This genuinely just seems like a bug in crane - arguably Crane should be ensuring that insofar as the env can lead to cargo changes, it should be compatible between all builds. In my head, this is likely being caused by Crane amending the nativeBuildInputs being passed in. I'm wondering if it might be more robust to add all the relevant inputs to nativeBuildInputs every time, rather than amending to the minimal set. However, it's possible I've misdiagnosed this specific issue - the person I'm working with has not yet provided me with the CMAKE_PREFIX_PATH of the 'build deps' and 'run clippy' tests.
The more general issues
cp --reflink=always file1 file2
between the two directories, and upon success, doingcp -r --reflink=always
as opposed to the symlinking. A further blocker to this might be that Nix builds in temp directories on a different filesystem to the Nix store - I am not sure about the specifics. I believe this would likely be ok on btrfs but may prove more of an issue on xfs. In any case, an optimisation rather than a total fix. However, if you could make this work on Linux, it might prove better on Mac.Just spitballing here - don't have any particular agenda - just have personally found this confusing - and as the person doing most of the 'new nix stuff' at my company, this leads to others finding this way more confusing.