ocaml / dune

A composable build system for OCaml.
https://dune.build/
MIT License
1.63k stars 404 forks source link

RFC: Detecting and overriding variables with configurator #1579

Open emillon opened 5 years ago

emillon commented 5 years ago

Hi,

I'd like to have your opinion on how to configure dune projects.

The problem

Some pieces of software require a build-time configuration. We already have configurator to query for some compile flags, or whether the host system supports certain features.

This is a useful, but not enough in certain cases. Let us see two examples:

nocrypto configuration

Nocrypto is a library of cryptographic primitives. Some of them are implemented in C, and have two versions: a generic, portable one, and an optimized one. The optimized one uses compiler intrisics that require a compatible compiler and CPU.

To select the implementation, it uses the cpuid library whether the host system supports it, and generates corresponding compilation flags.

That corresponds to the following code (N.B.: this is unreleased code):

(library
 (public_name nocrypto)
 (libraries ...)
 (c_names ...)
 (c_flags (:include cflags.sexp))
)

(rule
 (targets cflags.sexp)
 (action (run ./discover/discover.exe --output %{targets})))

; in discover/dune
(executable
 (name discover)
 (libraries dune.configurator cpuid ...)
)

Where discover.ml outputs the cflags depending on what cpuid returns.

This works, but it is missing an important feature: how to override the detected value? That can be useful, either to build a portable binary on a machine that supports acceleration, or to force an optimized binary and fail hard if this is not available.

Frama-C's share directory

Some programs make use (at runtime) of data files in a special directory; that can be configuration files, binary assets, scripts, etc.

This is in particular the case of Frama-C. Before its compilation, a configuration step defines a set of paths that are expanded in the OCaml source code. Data files are resolved using this macro expansion.

The ideal case is to make the binary relocatable and not depend on hardcoded paths, as described in #1185 (this can be implemented by using relative paths between the binary and the data directory). But that is not always easy. In particular, existing projects that use an explicit configuration step for this cannot easily be ported to dune.

How to fix this?

I propose to add a new kind of variable: project variables. (N.B. syntax is to be discussed separately).

Project variables are expanded in rules using %{project:name}. For example, in the above nocrypto rule:

(action
 (run ./discover/discover.exe
  --accelerate %{project:accelerate}
  --output %{targets})))

Project variables need to be set in dune-project:

(using project-variables 1.0
 (variables
  (accelerate detect)))

Each variable has a default value; the string "detect" in this case. If nothing in particular is done (such as when installing from opam), project variables are expanded to their default value.

But it is also possible to call dune configure set accelerate true: after doing so, the variable would expand to the string "true". This state is persisted through a file under _build. This is tracked as a dependency, like environment variables are. In particular, configuration changes will trigger a partial recompilation (conservative but hopefully precise).

Revisting the nocrypto example, this would mean adding a --accelerate argument to the configurator configuration, that would parse "false", "true", or "detect". If the latter is passed, it would call the detection function. Some support for that could be added to configurator itself. A similar exists in topkg as discovered_key.

As for Frama-C, this could enable a data-dir project variable. It is a bit more tricky to implement, because its default value would depend on the context's opam configuration. So, the exact details are more complicated, but reasonable semantics can probably be defined. An alternative is to have predefined project variables like prefix that can be interpreted by dune install.

It seems likely that this feature would work well with vendoring tools like duniverse: if we allow overriding variables from all projects in dune-workspace, we could reconfigure dependencies without having to patch their sources.

cc @bobot @avsm @samoht

Thanks!

samoht commented 5 years ago

Something which would be cool (and would bring that concept even closer to MirageOS's keys) is to also have documentation in documentation stanzas. For instance, this bit of MirageOS configuration file could be rewritten in:

(variable
  (name    hello)
  (doc     "How to say hello?")
  (type    string)
  (default "hello"))

And this would appear automatically in dune configure --help.

EDIT: you could declare environment variables in such stanzas as well (like in Cmdliner).

bobot commented 5 years ago

It seems useful :smirk:

(for later, syntax: dune configure set accelerate true, dune configure --set accelerate=true or dune configure --set-accelerate true?)

ghost commented 5 years ago

That seems good. A couple of comments: defining a new variable involves quite a lot of boilerplate:

  1. declaring the variable to dune
  2. passing --var %{var} to the action calling the configurator program
  3. parsing --var inside the configurator program

It would be nice if we could restrict this list to just the first item. I.e., you would only declare the variable to dune and then you could directly access it in the configurator program via a simple API:

if Configurator.V1.get "accelerate" Bool then
 ...
else
 ...

This would require a way for an action to communicate what configuration variables it read, so that Dune would know to re-execute it when the said configuration variables change. We could leave that as a future improvement though as it requires some work on the core of Dune.

We also need a good story for configuring a workspace composed of several projects defining the same configuration variable.

emillon commented 5 years ago

why do you propose to add the variable in dune-project file instead of dune file?

That might be one of the things to iron out in the implementation PR, but it to me seems that this requires a larger scope than a single dune file. It also gives it a unique name in the workspace, which plays nice with the vendoring story.

It would be nice if we could restrict this list to just the first item

̀Configurator.V1.getwould be very nice. That also makes presence/absence first class, so there is no need for a weird"detect"string. This might work if we generate the(run)action, like we do with(test)` stanzas.

ghost commented 5 years ago

This might work if we generate the(run)action, like we do with(test)` stanzas.

That's indeed one possibility. Though, when you think about it, you could even do it without. These variables are all set statically, so the set of all bindings always exist and doesn't need to be built. As a result, it is not really important that Dune knows beforehand what variable an action will access. Indeed, it only needs to know afterwards what variables a command read during its execution, so that during incremental compilation Dune can decide whether the previous result can be reused.

rgrinberg commented 5 years ago

I agree that we definitely a feature like this. Some comments:

Echoing Francois - why project files? I think the natural place to set variables is dune files. In particular, I'd expect them to be subject to same rules as everything in the env stanza. So I would suggest that the env stanza is one natural place to define them. I like the (variable ..) stanza idea as well. Regardless, I think that we should be able to support overriding variables in a scoped way.

I also agree that variables should be passed automatically to configurator. To solve the name qualification problem, we could also pass the project name to configurator. Therefore, you could access local variables with:

val local : name:string -> string option
val global : project:string -> name:string -> string option

Regarding setting variables using $ dune configure, IMO, it's a bit inconvenient to have the variable state to live in _build. I can imagine wanting to edit these variables in the editor as well. I think that having such state live in the source dirs would be more convenient. Introducing a dune-vars seems tempting, but I know that we'd rather not.

I also expect that users would be able to set variables per profile. Let's make sure that stays possible.

Finally, let's also think about having public vs. private variables. The distinction would be the same as for libraries. public variables would be installed along with a package and visible in other builds via qualified names.

bobot commented 5 years ago

Finally, let's also think about having public vs. private variables. The distinction would be the same as for libraries. public variables would be installed along with a package and visible in other builds via qualified names.

It is a very good idea, for example the Makefile installed in the ocaml directory is very useful. But for backward compatibility (disappearing variable, one variable replaced by multiple) it should be possible to give the value of a variable by computation. Perhaps a feature for later.

ejgallego commented 5 years ago

This works, but it is missing an important feature: how to override the detected value?

The way we to this in Coq is to have the configuration file marked as (mode fallback) so if the user runs configure the new flags take precedence over the old ones.

In my very personal point of view, I liked that Dune didn't have variables. IMVHO again, I think that variable-proliferation is a problem most make-based setup have. In fact it was interesting than when porting Coq many times the introduction of "variables" was suggested, however in most cases this was not the best solution, but I can only guess that if Dune had supported variables we would have seen them proliferating in the setup.

For the current Coq setup for example, our goal is to have a variable-less Dune setup other than compiler flags and optional dependencies.

Another reason I'm very wary of variables is that they easily make testing hard, as the number of test configurations grows combinatorialy on the number of vars.

Another point that could be important is discovery. Something I liked about having to declare the variables in dune-project is that it allows for an easier discovery of what variables the project has, and thus we can better understand the possible configurations, etc... Having them scattered may require some special support in order to do that. IMVHO this can be important as when people start using variables in large systems you sometimes reach a point where we have two binary variables a, b, giving us 4 setups, however only 3 maybe make sense. How to check this kind of consistency? It doesn't seem trivial.

bobot commented 5 years ago

IMVHO again, I think that variable-proliferation is a problem most make-based setup have.

Are you talking about variables in Makefile/make or the options selectable on the commandline of configure/autoconf?

Another point that could be important is discovery

The discovery would be handled by dune configure --help or dune configure --list-variables.

a variable-less Dune setup other than compiler flags and optional dependencies.

So you would have variables in order to specify if some optional dependencies should or not be used?

Another reason I'm very wary of variables is that they easily make testing hard, as the number of test configurations grows combinatorialy on the number of vars.

Indeed, I agree that one must be cautious in its use of variables, but they are still sometimes necessary. Testing would be simplified by the ability to supersede the value of a variable in a workspace.