Open snowleopard opened 5 years ago
A note about so-called forward-defined build systems, like Fabricate:
If you have a forward-defined build system, it means you have a total order on build tasks, which automatically prevents cyclic dependencies. Furthermore, it means that you never need to block tasks, because by the time you reach them in the working queue, all their dependencies must have been either skipped or rebuilt, so you can decide on the spot whether to rebuild it or not. In this way, build systems like Fabricate can be thought of as a special (trivial, really) case when the scheduler is just const [1..]
, i.e. it just runs tasks in the specified order, and we only need a rebuilder!
Interesting, my simpler and less formal way of saying it is:
The interesting thing here is that it assumes a finite set of rules. Note that Shake doesn't have a finite set of rules as something like a rule producing *.o
is really an infinite set of rules. However, if you see *.o
from *.c
as a rule, somehow "figuring out" the universe of possible rules that can be supported from a set of produced values, perhaps it becomes feasible in a more practical setting.
Agreed on the Fabricate remark, although it feels like the degenerate case of this example, but it does capture the essence to some degree, which all our previous models failed to do. Maybe the powerful thing about Fabricate is actually that the rules are "finite", or more precisely calculated from the set of produced files?
The interesting thing here is that it assumes a finite set of rules. Note that Shake doesn't have a finite set of rules as something like a rule producing
*.o
is really an infinite set of rules.
@ndmitchell Agreed, this is an interesting aspect that I'm not sure how to deal with yet. In our current model, the map k -> Maybe (Task c k v)
can be used to represent an infinite set of rules, like in Shake, but if we go towards a list of tasks whose outputs are not known statically it becomes unclear how we can express a "template rule" for compiling any *.c
file into *.o
. Perhaps, we could limit such tasks to being Applicative-only with respect to write
s? I seems that combining template rules and dynamic outputs is not going to work.
Maybe the powerful thing about Fabricate is actually that the rules are "finite", or more precisely calculated from the set of produced files?
Indeed, Fabricate seems to be different from other build systems in that all build rules are "singular" (i.e. not "template") and are given exactly as a finite list (if we assume one can't write an infinite Fabricate script with some kind of recursion).
Even if they were applicative, the fact you have an infinite number of them still seems problematic. And if they are finite, you don't need the applicative.
You could write an infinite fabricate script, but assuming it terminates, it will only be able to go down one path. It's a weird kind of finite, but definitely related.
Here is an example of how one could go about compiling a collection of files with read/write tasks:
type Get k f = forall a. k a -> f a
type Put k f = forall a. k a -> f a -> f a
type Task c k a = forall f. c f => Get k f -> Put k f -> f a
data Key a where
Dir :: FilePath -> Key [FilePath]
File :: FilePath -> Key String
compileAllCFiles :: Task Monad Key ()
compileAllCFiles get put = do
files <- get (Dir "src/c/")
srcs <- traverse (get . File) files
let objs = [ (File (f ++ ".o"), compileC o) | (f, o) <- zip files srcs ]
void $ traverse (uncurry put) objs
where
compileC = pure . id -- insert a C compiler here
An important aspect here is that traverse
requires Applicative f
, which means that if we use Haxl-like approach to inspecting computation trees, we get independent dependency tracking for each source/object pair, i.e. if one changes foo.c
, only the file foo.o
will be rebuilt.
Note also that compileC
has type Monad f => FilePath -> f String
, i.e. it is free to introduce intermediate dependencies on its own (for example, on *.d
files with dynamic #include
dependencies). This looks nicely compositional.
We could put this compileAllCFiles
task into the list of all tasks and keep trying to run it. As soon as source files are available (i.e. some of them may be generated), it will succeed.
To elaborate the above example a bit more:
type Get k f = forall a. k a -> f a
type Put k f = forall a. k a -> f a -> f a
type Task c k a = forall f. c f => Get k f -> Put k f -> f a
data Key a where
Dir :: FilePath -> Key [FilePath]
File :: FilePath -> Key String
compileAllCFiles :: Task Monad Key ()
compileAllCFiles get put = do
srcs <- get (Dir "src/c/")
void $ traverse (\src -> compileC src get put) srcs -- independent/parallel
compileC :: FilePath -> Task Monad Key ()
compileC cFile get put = do
let objFile = cFile ++ ".o"
src <- get (File cFile)
deps <- traverse (get . File) (cDependencies src)
void $ put (File objFile) (pure $ compile src deps)
where
cDependencies _src = [] -- insert dependency analysis here
compile src _deps = src -- insert a C compiler here
So the claim is that if the final step in a monadic dependency chain is an Applicative we can separate it and do partial recomputation? I'm not convinced that's true. Imagine we did a traverse with an index, so compiled files could see if they were the first/last file in the directory. Now you have dependencies that aren't fine grained. There is some level of isolation, but it's a lot more subtle.
What if you keep running the compileAllCFiles
and it keeps doing different things? e.g. adding a single .c file makes all outputs change? Where are we going to find a fixed point?
So the claim is that if the final step in a monadic dependency chain is an Applicative we can separate it and do partial recomputation?
Yes!
I'm not convinced that's true. Imagine we did a traverse with an index, so compiled files could see if they were the first/last file in the directory.
Not sure what exactly you mean. Something like this?
data Key a where
Dir :: FilePath -> Key [(FilePath, Int)] -- We need to depend on index
File :: FilePath -> Key String
compileAllCFiles :: Task Monad Key ()
compileAllCFiles get put = do
srcs <- get (Dir "src/c/")
void $ traverse (\src -> compileC src get put) srcs -- independent/parallel
compileC :: (FilePath, Int) -> Task Monad Key ()
compileC (cFile, index) get put = do
let objFile = cFile ++ ".o"
src <- get (File cFile)
deps <- traverse (get . File) (cDependencies src)
void $ put (File objFile) (pure $ compile src index deps)
where
cDependencies _src = [] -- insert dependency analysis here
compile src _index _deps = src -- insert a C compiler here
This doesn't seem to change anything. If this is not what you meant, could give an example?
What if you keep running the
compileAllCFiles
and it keeps doing different things? e.g. adding a single .c file makes all outputs change?
I think in this case the corresponding Dir
key is not an input anymore, so the compileAllCFiles
task will be aborted because one of its dependencies is not yet ready. There will be some task that will actually write this Key
(when all .c
files are finally in place), which will let the compileAllCFiles
to finally succeed.
This is an issue to discuss how the restarting scheduler can be used to build tasks with dynamic inputs and outputs, as opposed to tasks with dynamic inputs that are covered by the Build Systems à la Carte paper.
I'll start by sketching a proof that the restarting scheduler works for tasks with dynamic outputs. First of all, we need to assume that all build tasks are finite, i.e. that they terminate and have a finite number of input dependencies, which in turn guarantees that the restarting scheduler terminates. Why? Because it always makes some progress by either removing a task from the working queue, or unblocking one of the blocked tasks, in which case the latter is one step closer to completion.
Let's run the restarting algorithm with a working queue containing all build tasks.
When it terminates, we have one of two cases:
T
.Now we can argue that in the second case the target key cannot be built due to one of the two reasons:
Let
k
denote the target key. All tasks that are not inT
have completed and did not producek
(the build failed), hence we know that all tasks that could possibly buildk
must be inT
. Lett
denote one such task (if there is no sucht
, then this is the case (2) above). Sincet ∈ T
, it is blocked by some keyb
, and we can repeat our argument by takingb
as the target key: by doing this we will eventually either hit the case (2), or will circle back to a key we already examined (sinceT
is finite), which would indicate the case (1).The proof in non-constructive in the sense that we don't know which
t
could actually producek
and hence lead to a cycle. All we can say is that either sucht
does not exist (2), or it does exist but it will inevitably either lead to a cycle (1) or hit a dead end (2).