holmgr / cargo-sweep

A cargo subcommand for cleaning up unused build files generated by Cargo
MIT License
694 stars 31 forks source link

Suggestion: Add a flag to clean all currently unused build artifacts #2

Open holmgr opened 5 years ago

holmgr commented 5 years ago

Add a lastest flag to the CLI to enable cleaning of all build artifacts current not in use by making use of cargo check --message-format=json

jonas-schievink commented 5 years ago

Unfortunately that requires either

In either case, it means that a full build is necessary. This could be avoided by instead using Cargo's build plan output, which includes all compiler invocations and doesn't require actually building the crate. Unfortunately that Cargo feature is still unstable.

To add to that, I don't think either approach will easily handle dev-dependencies. EDIT: Actually the build plan output contains them if you do cargo build -Zunstable-options --build-plan --tests --benches --examples --bins and you probably also want --all in there for workspaces.

holmgr commented 5 years ago

I think the best approach is the one suggested in #5 instead, i.e do two separate steps in which one outputs a timestamp file before the build step, and then passing that timestamp as cleaning parameter after building.

jonas-schievink commented 5 years ago

Yeah, but that'd still be a somewhat imprecise solution relying on atime. Really, the simplest way to implement this reliably would probably be using the build plan (however, there might be some difficulty if you want to eg. run cargo test with different feature combinations).

holmgr commented 5 years ago

Yes, that is true. I am not familiar with the build-plan so I am not sure if it may fail due to nightly compiler not being compatible with some dependency and if its output is different from the stable. Furthermore this would require having the nightly compiler installed which may or may not be a problem. Finally there is the issue of feature flags as you mention, as well as --release vs debug mode etc. Perhaps it would require some type of pass-through of flags to the compiler so that the user can specify it.

jonas-schievink commented 5 years ago

Hmm, AFAIK enabling features can only add dependencies to the graph, so specifying --all-features might Just Work™.

--release vs. debug will result in different artifacts, that's true (they're even in different target subdirectories), but it should be relatively easy to just invoke Cargo twice and merge the resulting sets of paths (or require the user to run cargo-sweep twice, but I imagine most users would want to clean both at once).

Another issue that might be harder to solve is the use of --target for cross-compilation, which can also change the dependencies. Maybe it's best not to touch anything inside target if it's not located within target/debug or target/release, unless --target is passed, which you could then forward to Cargo and clean just that target's artifacts. (maybe it's also possible to automate even this by checking all directories inside target - they're named after the target you'd pass to --target after all)

The build plan being unstable is the biggest problem here. While it's possible to just invoke cargo +nightly build -Z..., using a different version of Cargo than the user uses for building the crate will result in collecting different artifacts - if I'm not mistaken different Cargo versions always use a different hash in the file names (which ironically is one of the biggest causes of target bloat - the target dir for my xmlrpc crate sits at a whopping 5.7 GiB and contains up to 15 duplicate artifacts for every dependency where only the hash differs).

holmgr commented 5 years ago

Ah ok. As long as build-plan is only useable for unstable I will be very unlikely to implement it as the primary/only solution although I agree it is probably the most reliable. As for now I think I will move on with the suggestion in #5

epage commented 5 years ago

Hmm, AFAIK enabling features can only add dependencies to the graph, so specifying --all-features might Just Work™.

Some features are mutually exclusive though. For example some crates let you choose the type of hashmap used.

epage commented 5 years ago

--release vs. debug will result in different artifacts,

Another issue that might be harder to solve is the use of --target for cross-compilation, which can also change the dependencies

If we make assumptions on the target layout (see https://github.com/holmgr/cargo-sweep/issues/6) then it could be reasonable to only clean up the current target which I think is what you are suggesting.

We could also accept target-dir like cargo-build and clean that up. This would be helpful in the CI case where the user consistently passes the same flags to all cargo commands.

The issue is if no target is specified, do we clean up all or only the default target? I'm assuming we'd want a flag to control this kind of behavior. One is better for CIs (precise cleanup), I imagine users just want it all gone.

As for now I think I will move on with the suggestion in #5

I hope we'll at least keep this around, tracking to the build plan becoming stable?

jonas-schievink commented 5 years ago

Some features are mutually exclusive though. For example some crates let you choose the type of hashmap used.

That's true, but AFAIK this is not done on a Cargo.toml level but rather by intentionally failing the build of your lib.rs, so as far as Cargo is concerned this is fine.

holmgr commented 5 years ago

Yes, I will keep this issue open until the build-plan is stabilized

Eh2406 commented 5 years ago

I have an idea, and really want to know if you think it will work. If every Cargo command added a filename.timestamp file containing the time when the command was invoked for each file menchen in the build-plan, would that be enough for cargo-sweep to work with? If so, I can look into making a PR to cargo.

jonas-schievink commented 5 years ago

Then you can just as well integrate the cargo sweep functionality into Cargo proper.

Eh2406 commented 5 years ago

Indeed, one day I would love to have "cargo clean --outdated" in cargo proper. I think creating an RFC to add it at this point will just lead to a large amount of bikeshedding about exactly which definition of outdated should be used.

I think, but can't guarantee, that the cargo team would be willing to add a simple file to the folders maintained by cargo. The contents of those folders are in implementation detail, and don't require an RFC to be changed.

holmgr commented 5 years ago

Adding a file.timestamp file would work very well, since currently sweep can only look at access times as an approximate measure. That said, I do not see any use for this timestamp files for any other projects than sweep really, and making such a change just to improve sweep seems like too little incentive for the cargo team (it has to be implemented and maintained). Then I think it is better just to integrate sweeps functionality into cargo as @jonas-schievink said.

If you want to make a PR/issue @Eh2406 to cargo then go ahead, I will back you up since I think it would be nice for my project. But if they do not find it to be useful for a wider context then atleast we can have a discussion about an --outdated flag or similar by possibly integrating parts of sweep into cargo directly.

Eh2406 commented 5 years ago

I just submitted a PR at cargo https://github.com/rust-lang/cargo/pull/6477. All feedback is welcome!

jyn514 commented 1 year ago

I have an idea, and really want to know if you think it will work. If every Cargo command added a filename.timestamp file containing the time when the command was invoked for each file menchen in the build-plan, would that be enough for cargo-sweep to work with? If so, I can look into making a PR to cargo.

I am confused how this is related to the original issue. I see that you've implemented that in cargo and the fingerprint directories have an invoked.timestamp file, but I'm not sure how that helps cargo-sweep? The question @holmgr wanted to answer was "will cargo build need to recreate this file" and I don't think we can determine that from the timestamp (imagine that someone runs cargo build && cargo +nightly build, so that artifacts from the non-default toolchain are newer).

jyn514 commented 1 year ago

--build-plan on the other hand I think should help, we can use the invocations[*].outputs key.

 ; cargo build --build-plan -Zunstable-options 2>&1 1>/dev/null | jq '.invocations[0].outputs'                   
[
  "/Users/jyn/src/cargo-sweep/target/debug/build/anyhow-1cf9b89f7ea6e648/build_script_build-1cf9b89f7ea6e648",
  "/Users/jyn/src/cargo-sweep/target/debug/build/anyhow-1cf9b89f7ea6e648/build_script_build-1cf9b89f7ea6e648.dSYM"
]