Open kornelski opened 4 months ago
In terms of incompatibilities, only a few come to my mind:
Hacks that recursively search ./target
looking for a file built by some build script in its OUT_DIR
. Ideally this should be solved by adding proper support for building arbitrary assets in Cargo, but for the sake of keeping compatibility with CARGO_TARGET_DIR
, this could be made to work by keeping OUT_DIR
s as subdirectories or symlinks in ./target
.
Binaries trying to link to .so
/.dylib
of their Rust dependencies, where the deps are built by Cargo, and are not installed on the system. This is currently basically unsupported in Cargo, because there's no way to set rpath
to anything that works (absolute path is too build-specific, and relative doesn't work for tests and examples that are in a different dir than bins). But if users are hacking around it with custom post-build steps, they may expect to find the 3rd party shared libs in ./target
. This again would be better solved by having Rust/Cargo aware of dylib dependencies, and copy/hardlink dylibs as needed and set rpath accordingly. Not many deps have crate-type = "cdylib"
, so these also could be kept/symlinked in ./target
for back-compat.
Build containers could have very strict read-only filesystems with only CARGO_TARGET_DIR
and CARGO_HOME
subdirs being writeable. This could be made backwards-compatible by reverting back to the current everything-in-one-place behavior when CARGO_TARGET_DIR
is set, and picking other env vars for configuring locations of the trimmed ./target
and the intermediate products cache dir.
@poliorcetics from https://github.com/rust-lang/rfcs/issues/3664#issuecomment-2183534143
Yes, the issue should be moved to cargo I think.
I'm not convinced at all this won't break backwards compatibility in some way.
It makes ./target contain only workspace-unique files, which makes it justified for every workspace to have one.
And I don't want one in any cargo project while still keeping isolation, which is entirely different from what you are proposing.
It enables moving registry deps to a shared build directory, without side effect of local projects overwriting each others' files. Sharing of dependencies matches users' expectation that the same dependencies shouldn't be redundantly rebuilt for each local project.
Once again, the RFC I wrote and original issue I inspired myself from do not ask for that, it asks for the opposite: myself and many others want separate targets dirs for every project.
There are probably as many reasons as there are users for it but common ones are different sets of features amongst projects, sharing of build caches for specific projects, CI builds wanting to separate projects for security, or one project pinning 1.2.3 in a dep B of a dep A and the other project pinning 1.2.4: A can have the same version for both but it's dependencies won't and cargo is not made to handle the case at the moment.
You are fundamentally solving a different issue, one that the RFC I posted is not trying to solve.
Overall, I see this as a solution alternative to #6790 and had recommended we have that conversation there (or on internals).
What isn't an intermediate build product? (and should stay in ./target)
This is likely going to be the most difficult topic to work through and we'll need to make sure we get wide input on this from #6790 users and others.
Registry dependencies would get unique paths derived from rustc version + package IDs + enabled features (so that different crates using different features don't invalidate each others' caches all the time). This would enable sharing built crates.io dependencies across all projects for the same local user, without also causing local workspaces to clobber each others' CARGO_TARGET_DIR/profile/product paths. Temp directories for local projects would need some hashed paths in the shared build/temp dir too.
imo this is out of scope for this proposal (see #5931) and we should keep this focused so as not to get distracted.
I see this as a re-framing of the problem, addressing #6790 and rust-lang/rfcs#3371
Instead of us defining a new artifact-dir
, we say target-dir
is the artifact directory and move everything else out into a "working directory".
--artifact-dir
because controlling of the location of the "working directory" is too low level for the CLI and should be reserved for config--artfact-dir
is that target-dir
is not the final directory files are put under but the parent directory. This is especially annoying for cargo build
vs cargo build --target
, dealing with profile names, etcPotential names
working-dir
(arbitrarily chose this one to refer to the concept moving forward)build-dir
Cargo script would default its target-dir
as its working-dir
This would need input from
This would need an audit of ways we publicly treat the target dir as a working dir, like exposing CARGO_TARGET_TMPDIR
Sharing sources over NFS adds another incompatibility when the ability to move ./target
out of the sources is eliminated.
Every modern buildsystem allows to keep sources and build results separated (and users and tools do not have problems with it). I do not think that cargo should go the way back and enforce a fixed ./target
directory.
I'm not suggesting to force it to always be ./target
. The CARGO_TARGET_DIR
can continue to move this directory elsewhere.
The main point is to reduce severity of problems that the current high-churn high-volume content of this dir causes.
We talked about this in today's Cargo team meeting.
Our care abouts include
target-dir
for dealing with final artifacts, so we likely want to preserve thattarget-dir
as that will push people down one path moving intermediate artifacts and then we completely change it on the user what path they should go, rather than having a stable story for how to handle thisWhile we acknowledged the potential for user confusion with CARGO_TARGET_TMPDIR
, we were fine with it not being associated with target-dir
The general shape of what we proposed in the meeting is...
target-work-dir
: Home of intermediate artifacts
target-work-dir
target-dir
)
target-work-dir
configtarget-work-dir
default ("{cargo-cache}/target/{workspace-manifest-path-hash}"
)target-artifact-dir
: Home of final artifacts
target
prefix)--artifact-dir
CLI flag is added{platform}
or {legacy-platform}
is not present with multi-target buildstarget-artifact-dir
config / clitarget-dir
config / clitarget-artifact-dir
default ("{workspace-root}/target/{legacy-platform}/{profile}"
)Legacy target-dir
--target-dir
is hidden on CLI. Maybe shows up in man pagesOther
CARGO_TARGET_TMPDIR
points within target-work-dir
target-work-dir
In theory, we could trivially do this by
target-work-dir
and target-dir
(if different)target-work-dir
and only when we do the "hardlink or cp" do we reference target-dir
Initial default is "{workspace-root}/target"
Template supports
{workspace-root}
{cargo-cache}
(pointing to CARGO_HOME
for now){workspace-manifest-path-hash}
Steps
Notes:
.cargo/config.toml
target-work-dir
fieldtarget-artifact-dir
Assumption: target-work-dir
takes some pressure off of target-artifact-dir
Defaulted to final location ("{workspace-root}/target/{legacy-platform}/{profile}"
)
Template supports
{workspace-root}
{cargo-cache}
(pointing to CARGO_HOME
for now){workspace-manifest-path-hash}
{platform}
{legacy-platform}
(elides host-target)Needs all of the details in the tracking issue to be finalized.
If target-work-dir
takes more than N time (1 year?) to stabilize, then we re-evaluate approving rust-lang/rfcs#3371. This is to try to balance the needs of the people who want something like rust-lang/rfcs#3371 now vs (1) the long-term inapplicability of that RFC and (2) the lack of stable "blessed" workflow for users (telling users to use solution X for several months and then telling them that is no longer "right" and they need to use solution Y).
target-dir
as the "artifact base dir" and provide a query command (like buck2 analyze
) to ask what the "artifact dir" istarget-artifact-dir
in CLI, leaving it for more advanced usestarget-dir
and target-artifact-dir
target-dir
for final artifacts. While we eventually de-emphasize target-dir
, those workflows will work just finetarget-dir
and target-work-dir
{platform}/{profile}
) if no templates are available{dl}
but that is a more limited / off in the weeds use case rather than front and center for userstarget-dir
if target-work-dir
or target-artifact-dir
is present
Something we overlooked in the above analysis is other "artifacts". In particular, I'm thinking of cargo package
which places files in $CARGO_TARGET_DIR/packages
. I'm assuming at least the .crate
s location is part our stable API. We'd need to decide about the files laid out on disk next to it.
Ways of solving this
examples/
, we could make the profile and target template dirs blank and always put this under packages/
{default}
variable that is artifact-specific.Just commenting here as I'm dealing with my own issues regarding the target dir, but personally, while it's nice for target
to contain final build products only by default, I still will want all of these build products out of the target directory for the sake of excluding them from backups and snapshotting.
For some context, I use ZFS snapshots as a form of fast local backups; not long-term backups in case of hardware or extreme software failure, but decent short-term backups in case I accidentally delete a file or mess up an update. However, I explicitly go out of my way to exclude as many things as possible from auto-snapshotting that qualify as "cache" because they can very quickly clog up my disk if I'm not careful.
(Also: since snapshotting is a filesystem-level feature, I can't just say "don't save files of this type in snapshots" since snapshotting works by instantly freezing the state of the FS into a snapshot, and doesn't copy files over like a long-term backup would.)
For example, today I just deleted 200 GiB of snapshots of target directories. Not the current target directories, but past versions of them from previous snapshots. Snapshots are good for incremental stuff like code because they're copy-on-write, but the contents of a binary are effectively random to any snapshotting tool and they'll end up being fully duplicated every time they're snapshotted, and that means you can end up with several times that amount of data in snapshots until everything eventually gets old enough to be deleted. The "effectively random" part also applies especially to the final products, since while crates that don't change won't change in their compiled artifacts, the final linked products definitely will.
So, as far as I'm concerned, moving the final build products back into the workspace without also having the option to keep them out effectively un-solves the problem that moving the target directory was meant to solve. After all, the final build products, modulo LTO (which isn't really going to happen for debug builds) will effectively be the same size as all the intermediate products, so, that means that about half the disk usage will not be saved. (I'm extremely approximating here; the point is that it's a considerable amount of the disk usage, even if it's not half. Even 10% of the size is still a lot when you consider that these are being multiplied across several snapshots.)
And note that yes, other languages like Node and Python also have this exact same problem, but I don't think that other languages' inability to solve this problem forgives Rust not solving it. Also, even though node_modules
can be massive, hundreds of GiB of binary artifacts is pretty hard to beat.
I love the idea of keeping intermediate products deduplicated and in one place. I just don't want that to obscure the goal of having the final products also somewhere else too.
After all, the final build products, modulo LTO (which isn't really going to happen for debug builds) will effectively be the same size as all the intermediate products
No, the intermediate products are usually many many times larger. Not just double, they can be 1000× larger! On the project I'm currently working, a clean debug build of a 20MB executable creates 2300MB of junk in target/
. After working on it for a while, it grows to 13GB of temp data for a 20MB result.
There are often many duplicate copies of libstd and other dependencies in each .rlib
file. There is a lot of duplication across code units. There's plenty of completely unused objects included in the dependencies, and stripped even without LTO (rust relies on --as-needed
flag). There are also often separate copies for build dependencies, builds with cfg(test)
, and incremental build cache.
@clarfonthey the plan calls for both target-work-dir
and target-artifact-dir
to be templated so you can move their content out. It does not call out templating of target-dir
as it calls for phasing that out. If we wanted to templatize it as a convenience way of setting both of the above, we'd likely want to wait for the above so we set the precedence for what people are generally expected to work with, rather than shifting expectations around on the user.
I still will want all of these build products out of the target directory for the sake of excluding them from backups and snapshotting.
FWIW, I've had decent results with target
being a symlink to somewhere that is not subject to backups/snapshotting. cargo clean
will remove the symlink, but everything else I've used is largely fine. Note that this puts intermediate and final artifacts into the same bucket.
I'm in the same boat as @clarfonthey, but with BTRFS snapshots, which I create every 15 minutes and then stream to longer-term storage. My debug builds are easily 2.5G and I clean up target
that is 300-700G basically every week.
I created ~/.cache/cargo/{git,registry,target}
for this reason, where ~/.cache
is in a separate subvolume that is not subject for snapshotting/backups. ~/.cargo/{git,registry}
are symlinks now (still hoping cargo starts respecting XDG one day) because they also grow to substantial sizes (currently 1.08M files and 24.1G together).
Proposed separation (especially templating for both new options) should work nicely for such use case, CARGO_TARGET_DIR
in .profile
has major consequences for build times when jumping between projects.
Excited!
Problem
There are a couple of issues with the
CARGO_TARGET_DIR
that are seemingly in conflict with each other:Multiple locations of
target
dirs complicate excluding them from backups and full-disk search, cleanup of the temp files, moving temp files to dedicated partitions, out of slow network drives or container mounts, etc. Users don't like that thetarget
dir is huge, and multiple instances of it add up to lot of disk space. Users would prefer a central location to ease management of the temp files, and also to dedupe/reuse dependencies across many projects.People (and tools) are relying on a relative
./target
directory being present to copy or run built files out of there. Additionally, users may not want to configure a sharedCARGO_TARGET_DIR
due to risk of file name conflicts between projects.However, the dilemma between 1 and 2 exists only because Cargo uses
CARGO_TARGET_DIR
for two different roles:Proposed Solution
So to satisfy both uses, I suggest to change the thinking about what the role of
CARGO_TARGET_DIR
should be. Instead of thinking where to put the same huge all-purpose mixedCARGO_TARGET_DIR
, think how to deduplicate and slimCARGO_TARGET_DIR
, and move everything non-user-facing out of it.Instead of merging or sharding the
CARGO_TARGET_DIR
as-is with all of its current content, and adding--artifact-dir
as a separate place where final products are being copied to — makeCARGO_TARGET_DIR
to be the artifact dir (without copying).As long as the
CARGO_TARGET_DIR
dir is the place for all of the build files, of all crates including all the crates.io and local builds, with all the caches, all the temp junk, then this is going to be a problematic large directory that needs to be managed. But if the purpose of the./target
dir was changed to be only for user-facing files (files that users can name, and would access via./target
path themselves), then this directory would be relatively small, with a good reason to stay workspace-relative.What isn't an intermediate build product? (and should stay in
./target
).a
/.so
, wherelib.crate-type
calls for them. Possibly.rlib
/.rmeta
in the future if there's a stable ABI..d
files for all of the above (so that IDEs and other build systems know when to rebuild the artifacts).OUT_DIR
forbuild.rs
, see #13663), then for build scripts belonging to the current workspace it would be inside./target
as well.So generally files that users build intentionally, and may want to access directly (run themselves, or package up for distribution) and files that users may need configure their IDE and debugger to find inside the project.
Crates in
[patch.crates-io]
with apath
are a gray area, an might also have their artifacts included in the./target
dir (but in some way that avoids clobbering workspaces' files).What isn't a final build product, and doesn't belong to
./target
:source = "registry+…"
).fingerprint
andincremental
dir content of all crates. These are implementation details of the compiler, and nobody should be accessing these directly via./target/…
..o
files. Users are not supposed to use them directly either (Rust has static libs for this).All of these should be built in some other shared build cache dir (one that is not inside
CARGO_TARGET_DIR
), configurable by a new option/env var.Registry dependencies would get unique paths derived from rustc version + package IDs + enabled features (so that different crates using different features don't invalidate each others' caches all the time). This would enable sharing built crates.io dependencies across all projects for the same local user, without also causing local workspaces to clobber each others'
CARGO_TARGET_DIR/profile/product
paths. Temp directories for local projects would need some hashed paths in the shared build/temp dir too.Advantages
./target
dirs (forcargo
itself, it makes./target/debug
with binaries and tests take 415MB, instead of 4.2GB). This makes cleanup of all the scatteredtarget
dirs less of a pressing problem../target
keeps relatively few files, and removes high-frequency-churning files out of it, which makes it less of a problem for real-time disk indexing (like search and backups on macOS)../target
stops being critical for build speeds, unlike I/O of the incremental cache and rewrites of thousands of.o
files. It becomes feasible to have project directory on a network drive without overridingCARGO_TARGET_DIR
(network filesystems are used by non-Linux systems where tools like Vagrant and Docker have to run full-fat VMs, and can't cheaply share the file system)../target
contain only workspace-unique files, which makes it justified for every workspace to have one.target/release/exe
etc.--artifact-dir
or.cargo/config
.Notes
No response