Open smyrman opened 6 years ago
Hey @smyrman — thanks for the feature request!
https://github.com/go-task/task is an awesome tool and one of the big design inspirations for tusk. I've considered implementing something similar to how they handle build sources, but I think tying it to the when
clause makes a lot of sense and fits really well with tusk's design.
Some additional things I'll need to think through (I may edit this comment as I think of them):
$HOME/.tusk/cache/<hash_of_full_tusk_file_path>
?cache_as: mytask-${targetOS}-${targetArch}
Is there a design that works for build targets, or does keeping track of task runs make more sense?
- Caching test runs would probably require the latter
Another use-case that we have with go-task
, is to build several docker containers, where you have the application source available, but there is no well-defined build target in the form of files. Docker is relatively slow (at least compared to e.g. go builds) in figuring out that things are up-to-date by itself.
Btw, I like the cache_as
idea. I think it makes a lot of sense.
It should probably be stored in the user's home directory to avoid having to gitignore anything. Maybe $HOME/.tusk/cache/
?
I think it would be nice to follow OS recommendations. For Linux XDG Bade Directory standard have finally seen some adoption in recent years. I certainly find it very practical when software do follow XDG (which they often also do on Mac and Windows btw.), as i can then more easily source-control my config files without also sorce-controlling cache or other non-wanted content.
In this case the directory would be ${XDG_CACHE_HOME:-$HOME/.cache}/tusk/<hash_of_full_tusk_file_path>
(according to the spec).
For Mac OSX and Windows there are other recommended paths for cache, which could be solved by assuming different default paths and still rely on XDG, like done in this library (not tried it myself):
All great points. I haven't done a deep dive into XDG yet, but it looks like it has sane defaults for out-of-the-box behavior, so it seems like a good choice.
I'm leaning toward supporting both use cases (build target vs. named cache) with a syntax like this:
when:
- building:
target: output.txt
from: input.txt
- caching:
as: name-${dependency}
from: input.txt
For both target
and as
, tusk can check the timestamp of the source files and ensure that they are older than the target file or cache name.
Still undecided on the best way to handle cache invalidation. The big problem is that by design, tusk does not create top-level commands to avoid namespace collisions with user-defined tasks. So we might be stuck with something like:
# Commands to clear cache at each level
tusk --clear-global-cache
tusk --clear-project-cache
tusk --clear-task-cache <task>
# Alternatively, clear cache globally or ignore for a single run
tusk --clear-cache
tusk --ignore-cache task # Would the cache still be written to?
For both target and as, tusk can check the timestamp of the source files and ensure that they are older than the target file or cache name.
Well, keep in mind that timestamps are in-perfect, and are subject to unwanted results both in the case of system time adjustments and changes to files done by source code management, such as checking out another branch in Git. Relying on a hash sum of all the source files would probably leave a much better result, which I believe is also what go build/install
does when it calculates if a rebuild of a package is needed or not. I suppose you could combine it with a check on weather or not a file exists:
when:
- not_exists: output.txt
- caching:
as: name-${dependency}
from: input.txt
Then there would be no need to check timestamps.
tusk --ignore-cache task # Would the cache still be written to?
go-task
has -f
or --force
, which basically tells it to ignore up-to-date checks for all tasks, but it would still generate a new hash when relevant.
Relying on a hash sum of all the source files would probably leave a much better result
Going by the modified time is what make
and go-task
both do by default, and I think there's trade-offs both ways. One is that it works without having to maintain a local cache. It also works even if the generated target was generated without tusk or on a different machine, although in some situations this might not be the desired behavior.
I'll have to take a look at what go build
is doing—I have looked a bit into how go-task
handles timestamps and checksums, but it's something I want to spend more time researching overall to get it right.
I suppose you could combine it with a check on weather or not a file exists
If tusk ends up with checksum based caching for targets—based on how clauses are set up to be independent, I think a dedicated building syntax is the cleanest way to get the behavior one would expect. It could use the same clause name, although I'm not sure yet if that's better or worse:
caching:
target: output.txt # Mutually exclusive with `as`
from: input.txt
Since the goal is to avoid rebuilding target files from unchanged source files, the behavior in a checksum model should probably be to only build if the source files available would not generate the same target files that are present. Taking a hash of the source and one of the target means tusk could validate whether any work would be required. If the source has changed or the target does not reflect what the source generated in the last run, work is required.
go-task has -f or --force, which basically tells it to ignore up-to-date checks for all tasks, but it would still generate a new hash when relevant.
The term force
becomes a little tricky when there's half a dozen kinds of conditional logic supported, but the behavior makes a lot of sense.
All good points, I'll let you take it from here :-)
@rliebz, if you haven't read it yet an "Old Build Story" and "Go Builds and the Isolation Rule" from rsc's last post in the vgo series seams relevant input to this issue, as it discusses the result caching issue in a very broad way, also talking about the invention of Make.
. Taking a hash of the source and one of the target means tusk could validate whether any work would be required. If the source has changed or the target does not reflect what the source generated in the last run, work is required.
The hash of the result could depend on the passed in options as well. So guess it doesn't make sense to have target
mutually exclusive with as
.
The syntax is not the most important, but an alternative to as
btw, could be to instead add a flag to options
that let's you state weather it affects the results of a task or not. E.g. something like:
options:
indent:
usage: how much to indent a JSON file
verbose:
usage: print more info
excludeFromCache: true
If one could easily determine if a task is up-to-date, tusk could potentially become more useful as a build tool, especially if such a feature is later paired with a dependency calculation system.
If one could hash these files and store it using the task ane and passed in options (with a predictable sorting) as a key, that would be the most useful approach to calculate the difference I think. Maybe something like this:
PS! Not using tusk atm., I use another tool https://github.com/go-task/task, so this is just a friendly suggestion. I like the design of tusk so far though:-)