Open ggreif opened 9 years ago
Have you seen http://neilmitchell.blogspot.co.uk/2015/04/cleaning-stale-files-with-shake.html - is that roughly what you were thinking of? What else were you looking for?
Tup does have such a feature, but I consider it somewhat of a misfeature by default - often the outputs that were useful but are no longer relevant are useful to keep around. Examples include files the build system used to generate or files it only generates when certain options are passed - e.g. you always build x86, you sometimes build 64bit versions, when you don't build 64bit versions you don't want them deleting just because they are stale. That said, it's certainly useful to have, if carefully controlled.
@ndmitchell Awesome, this is almost what I need. I had completely forgotten of that article! (Btw. you refer to a shakePrune
function, did you mean shakeArgsPrune
?)
So here is what I am after: a pruning function of type
pruneBeforeAfter :: Maybe [FilePath] -> [FilePath] -> IO ()
The first argument would be non-Nothing
when shake
encounters a loadable old build-state database. Just liveBefore
would represent the union of build products leading to the previous run's all want
targets. The second argument's file list would indicate the same for the current run.
The docs for Shake need reorganising, and fundamental articles like that promoting to the website...
I did indeed mean shakeArgsPrune
- I've updated the page.
So currently --live
gives you all live entries. Maybe what you want is --complete
which would give you all entries including those that weren't built this time, and after building the first is --complete
and the second is --live
?
Yeah, I think that is pretty much what I want. If --complete
gives me all build products (ever, even in former runs) produced by shake
, then the difference complete \\ live
is the set of files that can be routinely cleaned up without going for a hunt after all folders where stale build products might be retained. (In our case there might be thousands of build dirs scattered around.)
Of course if --live
only gives me the files which are needed for building the current target (in contrast to the potential valid build targets), then the above logic would be flawed. Time to play around with --live
I guess...
--live
exists. --complete
is easy to add. However, --live
only gives you things needed for building the current target, since if you were to enter any of the old targets, they may well still build. Typically in most Shake systems a build with no arguments will do all the right want
s so that everything rebuilds, and then you can treat --live
as all possible live targets.
As @ggreif mentioned tup
here, I'll add my 2cents:
Viewing tups tracking system as a misfeature most likely results from the expectation that you can do different builds with the same build graph (DAG
) by exchanging external variables (CC). I would assume the better approach to use such feature would be to reflect all possible build results within the DAG.
To keep up with the 32/64-bit example:
That would require one to provide different intermediate and final dependencies files per compiler version. The easiest solution would be to place the build products into compiler-related subdirectories.
If one wants to build only the 32bit version, a phony target can be used to collect only the want
executables of the 32bit build. Without the phony target, both versions will be build. With this setup, there is no need to reconfigure the DAG
if I want to switch between build targets.
That said, it's definitively hard to get every possibly relevant parameter reflected within the DAG
and thats the real problem with tup
.
However, if you require a feature similar to tup
I don't think complete \\ live
will do. What you would need instead is a --buildable
parameter which provides a list of all files that can still be build with the current DAG
(regardless of if they have been build in the previous run). Once you have such a feature you can use complete \\ buildable
to cleanup everything that can't be build any longer.
So, bringing this up again: The feature of tup is that, if (for example) a C file is renamed, the old .o file will be deleted.
In Shake, we might start with the following dependency graph:
buildAndLink -> [GetDirFiles "src//*.c",[Build foo, Build bar],Link main.exe]
Build foo -> [src/foo.c, lots of headers, out/foo.o]
Build bar -> [src/bar.c, lots of headers, out/bar.o]
Link main.exe -> [GetDirFiles "out//*.o",[out/foo.o,out/bar.o]]
If we renamed bar to baz, we would want to get:
buildAndLink -> [GetDirFiles "src//*.c",[Build foo, Build baz],Link main] -- changed!
Build foo -> [src/foo.c, lots of headers, out/foo.o]
Build bar -> [src/bar.c, lots of headers, out/bar.o] -- stale!
Build baz -> [src/baz.c, lots of headers, out/baz.o]
Link main -> [GetDirFiles "out//*.o",[out/foo.o,out/baz.o],main.exe] -- tricky!
The tricky part is that Shake needs to delete bar.o before linking, so that the GetDirFiles
call in Link
doesn't add a stale object file. How does it know to do this? Because bar.o
was generated by Build bar
, and buildAndLink
lost its dependency on Build bar
, and that was the only reference to Build bar
.
When do we know this? Well, if we are careful in matching the stored result to the current execution, we will know as soon as buildAndLink
calls need [Build foo, Build baz]
instead of need [Build foo, Build bar]
. So, we can't prune before building, as in #432, but we can prune during building, whenever dependencies are called; and at least for this example it is just early enough that we do not have to worry about botched linking.
It's true that, as @ndmitchell said, we might not want this pruning; e.g. if we read a config file and from that determine whether to call BuildAndLink X86
or BuildAndLink X64
, then modifying our config file will prune one of those. But a new needNoPrune
method would suffice. Shake's existing liveness / pruning feature can coexist with this (it will be useful to delete unused config variants).
The main differences in Shake's internals are that generated files would have to be tracked better (to determine that Build bar generated out/bar.o, without rerunning it), and also that Shake would have to maintain a back-reference mapping (to determine that nothing besides buildAndLink referenced Build bar). And it would also be good to add a compact
command that renumbered the Id's so they were sequential, now that Shake can remove keys completely
I think the compact
command is a good idea anyway - I was intending to call it something like gc
.
One note with the example - you have:
Link main.exe -> [GetDirFiles "out//*.o",[out/foo.o,out/bar.o]]
However, calling GetDirFiles "out//*.o"
is a lint violation, and getDirectoryFiles
explicitly says:
As a consequence of being tracked, if the contents change during the build (e.g. you are generating .c files in this directory) then the build not reach a stable point, which is an error - detected by running with --lint. You should only call this function returning source files.
The recommendation above would be for the linking to list all .c files, not all .o files.
Given this particular pattern is a violation, I wonder if some combination of build, lint, delete dead files, lint would also pick up the violations?
In my branch I changed that piece of documentation to:
As a consequence of being tracked, it is an error (detected by running with @--lint@) if the contents change during the run after this function is called (e.g. you are generating @.c@ files in this directory), since the build does not reach a stable point. You should only call this function after generating all relevant files.
I think my version is more correct, since lint will indeed not give an error if a file is added to the directory before getDirFiles
is called.
From the paper, the three requirements of Shake rules are:
The action / state / rule distinction is a little unclear in this example, but it seems the Link rule is valid.
I wonder if some combination of build, lint, delete dead files, lint would also pick up the violations?
Deleting files after the build without an intervening rebuild would indeed make lint fail, but I am not sure why you would expect lint to succeed after such a deletion. Lint will error if any tracked file is deleted...
Your version is correct - but only if people have meaningful control over when a rule is started, and thus when getDirFiles
is called. I'm not sure people do have that level of control, since when checking if rebuilds are necessary we are essentially "speculatively" running rules that haven't yet been required, but that we think might be required.
when checking if rebuilds are necessary we are essentially "speculatively" running rules that haven't yet been required, but that we think might be required.
True. But this is why Depends has two layers of list; it is only partially speculative. If we assume that building the rule will use the same dependencies in the same order for the same inputs, then the speculation is justified, as each dependency will be called regardless, up to the first changed dependency.
Do people have meaningful control over when a rule is started, and thus when getDirFiles is called?
I think they do, when they can control what rules call said rule. In particular, a pattern of the form need [a] >> need [b]
should ensure that b
is only called after a
has been run, if no other rule calls b
.
When
shake
(as of v0.15) executes a recipe it places the wanted in some location. Sometimes source files move, though, so certain build products can become obsolete with time. It would be nice to see an automatic way of cleaning up known obsoletes (i.e. the difference between the old DB's wanteds and the current run's DB's wanteds). The cleanup could be done on another thread. This would be a real win when developers of huge build trees (with hours of CPU-time worth of build products) wouldn't face the necessity to radically clean the build products to get rid of the obsoletes.Tup
(anecdotally) seems to have such a feature.