Review build system model

dbuenzli commented 9 years ago

The model is too weak, we should be able to take into account ocamldep, gcc -MD -MP in the model itself.

c-cube commented 9 years ago

Hi, I have a (possibly stupid) idea regarding dynamic dependencies. It's a pain to specify all inter-module dependencies by hand, even more with generated files (cppo, menhir, literate programming, ctypes stubs, etc.), so I think this issue is crucial.

On the other hand, if I understood correctly, Assemblage provides an applicative API (because then the structure of the project can be manipulated without actually building/generating files). Dynamic dependencies require a monad, and a fixpoint (because generating files might bring new dependencies, and so on).

The idea boils down to that: don't give a full monadic API to the user, but wrap the whole project into a loop (dependencies -> assemblage_project) -> assemblage_project. To discover (an approximation of) the project structure, you can lie and not compute the fixpoint, but only the first iteration.

Relevant: http://gallium.inria.fr/~fpottier/publis/fpottier-fix.pdf for lazy fixpoints.

Hope it's not totally stupid.

dbuenzli commented 9 years ago

Hope it's not totally stupid.

Not a bad idea. Somehow it resembles the way I solved the problem of recursive defs in React. However I suspect we can do without that.

Also we must be clear of what we are speaking of, there are two things:

Being able to automatically find out dependencies between statically (and manually) known build products.
Being able to automatically find out build products (example: the output of a module-relational mapper).

For me a first goal is 1. I suspect that 2. is actually solvable by using custom parts and by exposing standard build actions encapsulated by the standard parts (e.g. the ocaml actions). Since parts in the end are only a set of build actions along with metadata, underneath it's all build product manipulation and you can make a part look like another as long as you match the expectations others have of it.

However in both 1. and 2. one problem is that what may be determined at configuration time (assemblage setup) should be watched for changes. In fact I argue that the distinction between setup and build should not exist setting a configuration key is like like building a configuration key, see discussion here. So what I suspect is that if we can define (possibly private) configuration keys as being dependent of a file and that the system resetups (i.e. regenerates a build system) whenever that file changes maybe that should be enough to close the loop (Of course doing that efficiently will mean to move away from generating Makefiles, a retarded system anyways).

c-cube commented 9 years ago

I too think that Makefile isn't that good for OCaml, and separating setup/build won't work if you have to re-discover dependencies dynamically. If you want an efficient fixpoint, and discover which parts of the project needs be recompiled, you must do it within assemblage itself. I don't know how complicated it is to implement.

Side note: I think some ideas from Jenga might be interesting but I don't know enough the details (and don't want my build system to depend on Core anyway).

c-cube commented 9 years ago

Also relevant: http://neilmitchell.blogspot.fr/2014/07/applicative-vs-monadic-build-systems.html from the author of Shake (in Haskell). It's clear that applicative is not powerful enough, but the question of how to provide additional expressiveness is still open.

c-cube commented 9 years ago

I have refined a bit the idea, talking to @aspiwack. The fixpoint idea stems from the fact that when a dynamic dependency is computed, it opens a sub-part of the dependency DAG, which must in turn be explored for dynamic dependencies. etc.

My understanding on why Applicative is good, is that it allows the structure of the project to be manipulated as data (whereas Monad, with >>=, introduces arbitrary functions in the middle of the description. The structure hidden under >>= cannot be unfolded before runtime, i.e. build, it cannot be compared or cached). Providing the full power of a monad makes it harder to do the minimal amount of work, to parallelize. etc.

Of course, dynamic dependencies are slightly too hard for Applicative. Why not expressing them as lazy dependencies? Nodes of the dependency graph could be connected by edges (regular dependencies, stating that A depends on B and C, B depends on library D, etc.), and lazy edges. Lazy edges are initially declared in the project as (say) Lazy (ocamldep $ path "A.ml"), that is, a future value obtained from some command; once the command has been evaluated, it becomes something like Cached ((ocamldep $ path "A.ml"), [B, C]) (the command to run, and its last result). If re-computing the dependency is required ("A.ml" has changed) then we still know the command to run, and can compare its output to the previous one (it might be the same).

When a project has been build, the (cached) dependency graph is fully expanded and contains only evaluated lazy edges, so it's as if it was statically declared. If some files changed, we still know how to compute their new dependencies.

What do you think?

dbuenzli commented 9 years ago

My understanding on why Applicative is good, [...]

Yes. Being able to explore the project structure statically is quite useful. Regarding build expressiveness my personal view on assemblage is that I don't want it to be a silver bullet. I want it to solve the average cases well from small to mid-size projects and from building to distribution. Some people like to go crazy with build systems, I have a rather conservative approach and like to have them simple (even if that may mean more manual work in the first place). If people really need to do crazy things in their build system or have vast code bases to deal with they can use a more powerful one, I have no problem with that. This to say that I would really like to stick to applicative for now to see up to which point we can sustain its limitations.

What do you think?

Thanks for the input. That may actually be a much better formulation of what I have been trying to hand wave in various discussions. Note that currently a configuration value is almost what you describe; its value is determined only when it is evaluated and the result is cached. So I don't think we are very far from the system you are describing. It's just that:

For now we reevaluate everything on setup rather than store the configuration and the (sub)results of its evaluation on disk to remember between assemblage runs. (Well to be precise we store only part of the evaluation end result under the form of an .install file, a Makefile , etc...)
There's no notion of dependency on configuration keys (e.g. this key depends on that file), keys are either explicitely set by the operator through a command line option or determined automatically from the environment (e.g. uname -m is called).

So yes, having the dependency information stored in (private) configuration keys as mentioned above, on which build actions will depend to define their inputs seems a possible way out (if we have 1. and 2.).

In some sense in such a system it is not that we have dynamic dependencies, its is that we readapt build actions (i.e. the build system) to the changing environment. By doing this transparently (i.e. having a single build command, that efficiently looks for changes in the environment propagates them to the build system and invokes the build system) I hope we can get a system that's sufficiently expressive and efficient while remaining in the applicative world.

samoht / assemblage

Review build system model #144