rliebz / tusk

The modern task runner
https://rliebz.github.io/tusk/
MIT License
237 stars 21 forks source link

Add when (files) changed conditoin #20

Open smyrman opened 6 years ago

smyrman commented 6 years ago

If one could easily determine if a task is up-to-date, tusk could potentially become more useful as a build tool, especially if such a feature is later paired with a dependency calculation system.

tasks:
  build:
    options:
      targetOS:
      targetArch:
    when:
      changed:
        - file.txt
        - **/iin_root_folder_or_any_level_down.txt
        - */one_folder_down.txt

If one could hash these files and store it using the task ane and passed in options (with a predictable sorting) as a key, that would be the most useful approach to calculate the difference I think. Maybe something like this:

#.tusk/cache
build{targetArch:amd64,targetOs:linux}|<sha1>

PS! Not using tusk atm., I use another tool https://github.com/go-task/task, so this is just a friendly suggestion. I like the design of tusk so far though:-)

rliebz commented 6 years ago

Hey @smyrman — thanks for the feature request!

https://github.com/go-task/task is an awesome tool and one of the big design inspirations for tusk. I've considered implementing something similar to how they handle build sources, but I think tying it to the when clause makes a lot of sense and fits really well with tusk's design.

Some additional things I'll need to think through (I may edit this comment as I think of them):

smyrman commented 6 years ago

Is there a design that works for build targets, or does keeping track of task runs make more sense?

  • Caching test runs would probably require the latter

Another use-case that we have with go-task, is to build several docker containers, where you have the application source available, but there is no well-defined build target in the form of files. Docker is relatively slow (at least compared to e.g. go builds) in figuring out that things are up-to-date by itself.

smyrman commented 6 years ago

Btw, I like the cache_as idea. I think it makes a lot of sense.

smyrman commented 6 years ago

It should probably be stored in the user's home directory to avoid having to gitignore anything. Maybe $HOME/.tusk/cache/?

I think it would be nice to follow OS recommendations. For Linux XDG Bade Directory standard have finally seen some adoption in recent years. I certainly find it very practical when software do follow XDG (which they often also do on Mac and Windows btw.), as i can then more easily source-control my config files without also sorce-controlling cache or other non-wanted content.

In this case the directory would be ${XDG_CACHE_HOME:-$HOME/.cache}/tusk/<hash_of_full_tusk_file_path> (according to the spec).

For Mac OSX and Windows there are other recommended paths for cache, which could be solved by assuming different default paths and still rely on XDG, like done in this library (not tried it myself):

rliebz commented 6 years ago

All great points. I haven't done a deep dive into XDG yet, but it looks like it has sane defaults for out-of-the-box behavior, so it seems like a good choice.

I'm leaning toward supporting both use cases (build target vs. named cache) with a syntax like this:

when:
  - building:
      target: output.txt
      from: input.txt
  - caching:
      as: name-${dependency}
      from: input.txt

For both target and as, tusk can check the timestamp of the source files and ensure that they are older than the target file or cache name.

Still undecided on the best way to handle cache invalidation. The big problem is that by design, tusk does not create top-level commands to avoid namespace collisions with user-defined tasks. So we might be stuck with something like:

# Commands to clear cache at each level
tusk --clear-global-cache
tusk --clear-project-cache
tusk --clear-task-cache <task>

# Alternatively, clear cache globally or ignore for a single run
tusk --clear-cache
tusk --ignore-cache task  # Would the cache still be written to?
smyrman commented 6 years ago

For both target and as, tusk can check the timestamp of the source files and ensure that they are older than the target file or cache name.

Well, keep in mind that timestamps are in-perfect, and are subject to unwanted results both in the case of system time adjustments and changes to files done by source code management, such as checking out another branch in Git. Relying on a hash sum of all the source files would probably leave a much better result, which I believe is also what go build/install does when it calculates if a rebuild of a package is needed or not. I suppose you could combine it with a check on weather or not a file exists:

when:
  - not_exists: output.txt
  - caching:
      as: name-${dependency}
      from: input.txt

Then there would be no need to check timestamps.

tusk --ignore-cache task  # Would the cache still be written to?

go-task has -f or --force, which basically tells it to ignore up-to-date checks for all tasks, but it would still generate a new hash when relevant.

rliebz commented 6 years ago

Relying on a hash sum of all the source files would probably leave a much better result

Going by the modified time is what make and go-task both do by default, and I think there's trade-offs both ways. One is that it works without having to maintain a local cache. It also works even if the generated target was generated without tusk or on a different machine, although in some situations this might not be the desired behavior.

I'll have to take a look at what go build is doing—I have looked a bit into how go-task handles timestamps and checksums, but it's something I want to spend more time researching overall to get it right.

I suppose you could combine it with a check on weather or not a file exists

If tusk ends up with checksum based caching for targets—based on how clauses are set up to be independent, I think a dedicated building syntax is the cleanest way to get the behavior one would expect. It could use the same clause name, although I'm not sure yet if that's better or worse:

caching:
  target: output.txt  # Mutually exclusive with `as`
  from: input.txt

Since the goal is to avoid rebuilding target files from unchanged source files, the behavior in a checksum model should probably be to only build if the source files available would not generate the same target files that are present. Taking a hash of the source and one of the target means tusk could validate whether any work would be required. If the source has changed or the target does not reflect what the source generated in the last run, work is required.

go-task has -f or --force, which basically tells it to ignore up-to-date checks for all tasks, but it would still generate a new hash when relevant.

The term force becomes a little tricky when there's half a dozen kinds of conditional logic supported, but the behavior makes a lot of sense.

smyrman commented 6 years ago

All good points, I'll let you take it from here :-)

smyrman commented 6 years ago

@rliebz, if you haven't read it yet an "Old Build Story" and "Go Builds and the Isolation Rule" from rsc's last post in the vgo series seams relevant input to this issue, as it discusses the result caching issue in a very broad way, also talking about the invention of Make.

smyrman commented 6 years ago

. Taking a hash of the source and one of the target means tusk could validate whether any work would be required. If the source has changed or the target does not reflect what the source generated in the last run, work is required.

The hash of the result could depend on the passed in options as well. So guess it doesn't make sense to have target mutually exclusive with as.

The syntax is not the most important, but an alternative to as btw, could be to instead add a flag to options that let's you state weather it affects the results of a task or not. E.g. something like:

options:
  indent:
    usage: how much to indent a JSON file
  verbose:
    usage: print more info
    excludeFromCache: true