samoht / assemblage

A collection of tools to manage the configuration of OCaml projects
54 stars 4 forks source link

Get a cross compilation story #126

Closed dbuenzli closed 4 years ago

dbuenzli commented 9 years ago

Given the current state of the OCaml toolchain I think the we have that mostly right.

One of the good aspect of the new configuration system is that by definition it mostly forces the executable of all build actions to be redefinable through the command line at configuration time. Builtin configuration keys for specifying build/host OS and architecture are also available for use.

Assemblage should also be careful about its use of the OCaml toolchain for its own purposes. Since we are using compiler libs this shouldn't be much of a problem. There was however an ocamlfind use for auto-loading the assemblage library which can now be overriden by the ASSEMBLAGE_OCAMLFIND environment variable (not to be confused with the ocamlfind configuration key used to lookup project's package dependencies). That variable could be removed if we move to a dynlink setting for assemblage rather than toploop.

dbuenzli commented 9 years ago

One of the good aspect of the new configuration system is that by definition it mostly forces the executable of all build actions to be redefinable through the command line at configuration time. Builtin configuration keys for specifying build/host OS and architecture are also available for use.

This doesn't seem to be good enough since we have a single key for these executables. In practice we may need two: part of the build system may need the build platform OCaml toolchain because for example the build system has OCaml programs that generate code and need to be run on the build platform. See for example @whitequark's port of tgls here.

The hack there is that instead of the single shot ocamlbuild invocation used in a regular build, he first builds the program that generates the bindings that will be invoked by the build system itself. This first build uses the build-os toolchain. It then proceeds by invoking the regular package build procedure (in which ocamlbuild will not rebuild the program that generates the bindings but have it at hand to generate the bindings) using the host-os toolchain through an environment variable and ocamlfind.

I would rather avoid having to use ocamlfind (which assemblage mainly sees as a source of information to build command line fragments, see #146) and environment variables for this. We should try to devise a scheme that allows to specify build-os and host-os values for utilities (both utilities run on the build-os but may have different outputs). By enriching or interpreting the usage type for a part we could then automatically select the right executable to invoke for the task at hand in the part's actions.

Then there is the issue of having build-os package and host-os packages since opam doesn't allow to install more than one version of a package, I think build-os packages could live in another switch and the build-os ocamlfind would simply direct you to the paths there. Finally there's the issue of actually getting this information from the build environment, most of the time represented by opam. It should be noted that this would e.g. double all the variables like native, native-dynlink etc. So this should definitively be designed in hand with a good cross compilation story for opam (@Altgr, @avsm, @samoht).

It seems that when you start with the build/host os distinction, it creeps everywhere, so we should be sure to make it easy for users not to really have to care about that and that the correct things® are being done without them needing to be too aware about the details except correct part usage tagging.

Brain dumping a bit here.

dbuenzli commented 9 years ago

Also do I have my terminology right ? It seems that autoconf uses host for what I call target.

UPDATE changed terminology in the above posts.

samoht commented 9 years ago

Then there is the issue of having host package and target packages since opam doesn't allow to install more than one version of a package, I think host packages could live in another switch and the host ocamlfind would simply direct you to the paths there. Finally there's the issue of actually getting this information from the build environment, most of the time represented by opam. It should be noted that this would e.g. double all the variables like native, native-dynlink etc. So this should definitively be designed in hand with a good cross compilation story for opam (@Altgr, @avsm, @samoht).

I think it's a good idea to rely/extend on opam switches to deal with cross-compilation in general.

whitequark commented 9 years ago

@samoht opam switches don't really work for cross-compilation. For example, you will often want to run an identical version of the package on the build and host system.

@dbuenzli autoconf uses "build" for what you use "host", and "host" for what you use "target".

dbuenzli commented 9 years ago

@samoht opam switches don't really work for cross-compilation. For example, you will often want to run an identical version of the package on the build and host system.

Didn't get that. What does running a package mean ? Do you have an example ?

@dbuenzli autoconf uses "build" for what you use "host", and "host" for what you use "target".

Yes. Do you think the proposed terminology is problematic ? I'm not trying to be special. While their "build" is clear I find their "host" less obvious than "target" (maybe because of the confusion with hosts in VMs). Should we maybe switch the terminology to build-os and target-os rather than host-os and target-os ? OTOH build is keyword that already happens everywhere in a build system, so it makes discussions less clear in my opinion, for example we can talk about the host toolchain without this being ambiguous, build toolchain wouldn't be as obvious, you'd need to say the build-os toolchain.

whitequark commented 9 years ago

No, "host toolchain" is still ambiguous because of https://en.wikipedia.org/wiki/Cross_compiler#Canadian_Cross. I suggest sticking to the autoconf terminology.

I mean, let's say, ppx_deriving or even sexplib. You have a build component (the ppx, or the camlp4 plugin) and a host component, which must have matching versions.

dbuenzli commented 9 years ago

No, "host toolchain" is still ambiguous

Ok so I'll move to build-os and target-os. I'd rather avoid using host-os for what is now target-os I think it will confuse users in general, especially say in a mirageos setting.

You have a build component (the ppx, or the camlp4 plugin) and a host component, which must have matching versions.

Damned. See why I hate pre-processors.

whitequark commented 9 years ago

(target-os) No, this is actually worse than the previous variant. target means the system for which a toolchain emits code... which you really should get right when you're talking about cross-compiling.

(preprocessors) You do realize that some of your packages do effectively the same thing, right? For example, tgls...

dbuenzli commented 9 years ago

(target-os) No, this is actually worse than the previous variant. target means the system for which a toolchain emits code... which you really should get right when you're talking about cross-compiling.

Ok so correct terminology shall be used and propagated (even though it hasn't tricked in my head yet).

You do realize that some of your packages do effectively the same thing, right? For example, tgls...

To be precise no, for now tgls generates code at distribution time, your version of tgls does that...

Except for the ones that use js_of_ocaml none of my packages do use pre-processors. A bunch of these do generate code at distribution time which but this is very different from pre-processing.

Besides it's not that I will not ever use a pre-processor, but I still hate them and think that most of the time they are wrong solutions to a real problems that should be solved at the language level by having meta-programming facilities as an integral part of the programming language itself.

whitequark commented 9 years ago

Well, it's not like I want to (https://github.com/dbuenzli/tgls/issues/12), but point taken.

I don't disagree, but getting rid of all preprocessors is an unrealistic goal. Even if we fix OCaml completely, there are also e.g. packages which invoke protoc or similar tools.

dbuenzli commented 9 years ago

I don't disagree, but getting rid of all preprocessors is an unrealistic goal.

Sure.

dbuenzli commented 8 years ago

(target-os) No, this is actually worse than the previous variant. target means the system for which a toolchain emits code... which you really should get right when you're talking about cross-compiling.

So it seems that the whole OCaml toolchain is using the wrong terminology e.g. here and the host and target fields of ocamlc -config. Should we really use a different terminology from the one of the OCaml compilers ?

whitequark commented 8 years ago

Let's reiterate:

It seems like OCaml is using the terms correctly here; it assumes though that build and host are always the same. There actually should be no changes related to semantics of target as it already does what it should; the cross-compiling-related changes will only decouple build from host.

dbuenzli commented 8 years ago

Ok then so most of the time host is going to be equal to target I guess. But then isn't the terminology wrong in that PR (or maybe I'm just confused) ?

whitequark commented 8 years ago

Yes, right now almost all OCaml builds have host equal to target equal to build.

Yes, except it's worse than that: "host compiler" is not something that has a precise meaning. You have to list both host and target to meaningfully describe a compiler (whereas build is just an environment detail.)

dbuenzli commented 8 years ago

Ok thanks.

dbuenzli commented 8 years ago

For example, you will often want to run an identical version of the package on the build and host system. [...] You have a build component (the ppx, or the camlp4 plugin) and a host component, which must have matching versions.

@whitequark This seems like an artefact of broken build systems. Formally in the package the build components should be compiled with both the build-os and host-os toolchain (the latter if you want to be able to do binary distributions of packages) and the host components should only be compiled with the host-os toolchain. So in a better build world this is not a real argument --- of course it's easy to make the point that this world doesn't exist. Would you see another argument for the need of build-os and host-os package version sync ?

(I'm still convinced that one switch per architecture is a bad idea if you want to scale, but I'm trying to write a proposal for opam multiarch support in a single switch and found out that the "need same version" of the package in the build-os and host-os doesn't seem to hold water).

whitequark commented 8 years ago

The fundamental point here is that the toolchain targeting build-os must be also available to the packages being built in the switch targeting host-os. It is not particularly important how that happens.

whitequark commented 8 years ago

Additionally, this is as much a problem with build systems as installation; you need to build (some parts of) a package using a compiler targeting build-os and then install them into a place where the host-os toolchain would expect to find them. By far the easiest way to achieve this is the same-version requirement; anything else would require a huge amount of work for benefit that is unclear to me.