enable build parallelization when dependencies are known

ocaml / ocamlbuild

The legacy OCamlbuild build manager

Other

121 stars 81 forks source link

enable build parallelization when dependencies are known #306

Closed ivg closed 4 years ago

ivg commented 4 years ago

It is well-known (#207) that ocamlbuild is not very good at build parallelization and that dynamic discovery of dependencies makes it hard to plan the jobs ahead. And while it is possible, it is not trivial to implement and benefits are still questionable.

Other build systems solve this issue by requiring users to specify explicitly the dependencies between components. This is, of course, a burden for a user, but makes parallelization trivial, since the dependency DAG is known beforehand.

Nowadays, ocamlbuild is often used as a backend for OASIS. And OASIS requires a full specification of build dependencies, both external and internal, therefore the dependency graph is known to OASIS. Unfortunately, this information is not leveraged, since there is no mechanism through which we can pass this information from OASIS to ocamlbuild.

I'm opening this issue to start discussing, whether this is viable at all and, if yes, then what would be the easiest way to implement it. From a practical perspective, this will drastically improve our build times (in BAP). We have more than a hundred internal libraries, many of them could be built in parallel. I believe that other projects that are still using oasis will benefit from this feature also.

So, given that I have a DAG of targets, what would be the best way to pass it to ocamlbuild? The first thing that comes to mind is either to add a new target file or extend the existing itarget file, so that more than one target could be specified per line, e.g.,

, e.g., assuming we have a graph of tasks

digraph {
    t1 -> t2 -> t3;
    t4 -> t5 -> t6;
    t3 -> final;
    t6 -> final;
}

we can construct the following .itarget:

t1; t4
t2; t5;
t3; t6;
final;

And the executor will run targets on the same line in parallel, while still running each line of the itarget file sequentially. This syntax could be extended even to mllib and other files, so that more fine-granular dependencies (intralibrary dependencies) could be specified.

This is just an example, to make it less abstract. To summarize - we have a DAG in OASIS and we could use it. But there is no way to specify dependencies statically in ocamlbuild. So my question, is this approach viable, are there better options? (like using tags instead). How hard it would be to implement? (I believe that itarget approach would be hard, but if we will specify a special .graph file which won't work through builder but be supported directly by the rule it should be trivial).

gasche commented 4 years ago

I performed a small experiment in September 2013 with better parallelization strategies for ocamlbuild. I wrote my ideas in a blog post (with links to a rough prototype), Current (and future?) state of OCamlbuild parallelization (using the web archive as the Gallium servers seem to be down today), which explained my thinking of the time. You might be interested in looking at it.

I think the general answer to your question is "yes, but who cares enough to invest the work to do this?". If OASIS really has a complete dependency graph, why not use another backend than ocamlbuild that parallelizes better? How does that compare to the cost of porting your project to a different build system that paralellizes better? Who is going to invest the time to improve parallelism in ocamlbuild, when it's rarely used for new projects anyway and people prefer to use Dune instead?

When I played with build parallelism in Batteries, I noticed that I could get sensible gains by making the build system less general, giving more information for ocamlbuild at the places where parallelism is available, that it would otherwise only discover laterin a parallelism-limiting way. Concretely, a lot of time was spent build .cmi files sequentially, because a lot of .cmo would be built in parallel from a .mllib file, but then following the .cmi -> .cmo dependency for each .cmo target would prevent .cmi parallelism. Changing the .mllib rule to build .cmi files as well as .cmo files (in parallel) would increase parallelism opportunities quite a bit. If you are willing to invest time in speeding up an ocamlbuild system, and don't mind doing something less-generic, I would try playing with this kind of approaches.

(I don't remember exactly how I investigated parallelism opportunities for a fixed project (Batteries). I may have simply hacked around the main compilation rules (.cmo and .cmi production) to log when they start and stop, and visualize the overlaps.)

whitequark commented 4 years ago

If OASIS really has a complete dependency graph, why not use another backend than ocamlbuild that parallelizes better?

One good thing about OASIS (from my perspective as someone who often cross-compiles stuff) is that you can bypass all the clever logic in OASIS that interrogates the system for the compiler paths, dependencies, etc and use OCAMLFIND_TOOLCHAIN through ocamlbuild. This is clearly unintentional but a property that any new backend IMO should preserve.

ivg commented 4 years ago

You might be interested in looking at it.

Yep, I read it. And came to the conclusion that there is a low-hanging fruit that can give ocamlbuild/oasis a second life.

I think the general answer to your question is "yes, but who cares enough to invest the work to do this?".

Well, the user base of BAP and myself in particular :)

If OASIS really has a complete dependency graph, why not use another backend than ocamlbuild that parallelizes better?

Because, there is no such backend. And, because, our myocamlbuild plugins are not trivial.

How does that compare to the cost of porting your project to a different build system that paralellizes better?

I don't know. Porting BAP to dune is nearly impossible because we definitely don't have enough resources to do this (and because of arbitrary design decisions that dune developers have made so that it made impossible a direct port) .

... and people prefer to use Dune instead

It is not that people really prefer it, they are preferred to use it. Like they've been told that this is the best system, so use it. I don't think that dune is better than ocamlbuild, yes it is faster, but about 99% of OCaml projects do not need this speed. I don't want to turn this discussion into Dune vs OCamlBuild holy war, but the problem with dune that I see is that dune is overcomplicated for small projects and oversimplified for large projects. And while previously a new OCaml user could start coding OCaml and just use ocamlbuild example.native --, today we ask them to learn a yet another syntax/dsl just to build helloworld.ml. That increases the cognition burden and averts people from using OCaml and/or BAP.

whitequark commented 4 years ago

And while previously a new OCaml user could start coding OCaml and just use ocamlbuild example.native --, today we ask them to learn a yet another syntax/dsl just to build helloworld.ml. That increases the cognition burden and averts people from using OCaml and/or BAP.

Doing anything nontrivial with ocamlbuild (e.g. C extensions) requires you to learn its baroque and sparsely documented API, which is an immense cognitive burden and multiple people I personally know have avoided OCaml because they needed integration with C code (game developers usually do) and gave up trying to get ocamlbuild to do what they need. (This was pre-Dune, so their other options were usually even worse.)

ivg commented 4 years ago

Well, it contradicts my experience. We're using LLVM in BAP and it was very easy to teach ocamlbuild to build C++ and even use OASIS to specify c+++ files. If you're using plain C, then it works out-of-the-box, and with OASIS you can just reference your C files in the library specification and everything will work fine.

In my previous project, where we were doing a lot of embedded programming and hardware interaction, it also wasn't a big deal. OCamlBuild played nicely with Boost bjam and the only thing that we need to add a single line to myocamlbuild file, which just specifies the -l option on the linking phase.

whitequark commented 4 years ago

The perspective of an experienced OCaml user like you and a new OCaml user tends to be vastly different, so while you could claim that ocamlbuild works well for experienced people (it certainly works well enough for me), that has no bearing on whether it is a barrier to entry (it is).

ivg commented 4 years ago

Well, I still disagree that it is a barrier. But these are our personal opinions. What I have to admit is that dune documentation is much better than ocamlbuild has (or had). But probably writing better documentation would require less effort than developing a new build system from scratch. But nobody likes to write documentation. Anyway, I would still encourage new users to use ocamlbuild and switch to dune as soon as their project grows.

And again, let's go back to our topic. For BAP switching to dune is not an option. The project is too large and dune is not versatile and powerful enough as OASIS/ocamlbuild. At least yet. So for the next couple of years, we're stuck with OASIS.

gasche commented 4 years ago

I can't personally afford development time on ocamlbuild for completely new projects. (I'm happy to keep it working and fix bugs, with the note that compatibility with existing projects at this point weights more than convenient for new projects). If you wish to develop a better parallelism mode for ocamlbuild, I am happy to follow along and assist as necessary. On the other hand, investing equivalent resources into extending Dune (or some other build system with better community traction than ocamlbuild) to suit your needs may also be a good choice.

Here is one extra idea I had for ocamlbuild, or in fact many other build systems: if you record the dependency graph and build actions as you do a build, you should be able to output a straighline recipe out of it (with no more dependency-discovery at all). The recipe has all dependency information, so it should be possible to rebuild it from a very simple interpreter, with perfect parallelism. (Dune in fact started as an interpreter layer on such a fairly simple recipe language, and I have considered trying to use its internal command-runner for this, but afaik. it was never extracted as a separate/independent library.) This is similar to the suggestion of "building by generating a Ninja buildsystem", but due to the dynamic nature of ocamlbuild's rules you need to actually build to get the full view, you can't statically generate the recipe and then build; the recipe is only available for subsequent builds in the same state. (But you could write the recipe interpreter such that if the dependencies have changed a bit and the recipe fails, it's not the end of the world and you call the clever-and-slow build system again, reusing already-done work.)

ivg commented 4 years ago

If you wish to develop a better parallelism mode for ocamlbuild, I am happy to follow along and assist as necessary.

Sure, I'm not asking for anything more, I clearly understand that I'm the one who will be implementing this. And moreover, my budget is also very limited, I'm ready to allocate a couple of days on this task.

My original idea is very simple. Probably I should start with it, so that you can follow my train of thought, maybe I made a wrong turn somewhere and there is a shorter path. So taking BAP for example, we have about 100 libraries. Their dependency graph suggests parallelization, and when BAP is installed from opam this is what happens, roughly the graph looks like this:

digraph {
 {a,b,c,d,e} -> bap -> {f,g,h,i,j,k,l,m,n}
}

graph

There are a couple of basic libraries (denoted a,b,c,d) here, like monads, graphlib, etc, which are independent and could be built indepdently. The there is a big bap standard library, and finally the meat - lots of libraries, analyses, and plugins (denoted here as f..n, but in fact there are several dozens of them).

We can build a,b,c,d,e in parallel, as they do not have any cross edges, then build bap, then build f..n also in parallel. My first thought was, I can run ocamlbuild a & ocamlbuild b & ... ocamlbuild e &, like opam does. But for that, I need to run them in separate _build folders, since only one ocamlbuild process can work with _build at time. So my next big idea was: "but we have an executor in ocamlbuild that can run multiple jobs in parallel, we just don't have a nice command-line interface to call it". So probably a shorter path would be adding an -independent-targets command line option, that will tell ocamlbuild "trust me, those targets are independent, and you can run them in parallel" so that we can just invoke ocamlbuild three times:

ocamlbuild -indepdendent-targets a b c d e
ocamlbuild bap
ocamlbuild -independent-targets f g h i j k l m n

What do you guys think, is it doable in a couple of days (plus/minus a week)?

gasche commented 4 years ago

I would need to check more in details, but I'm not sure that the ocamlbuild executor would deal with this easily. As I explained in my blog post (I think?), tasks return one final command, and static parallelism information is used to run the final command of each task in parallel. Here you would end up with a sequential build of a/b/c/d/e, following by a parallel call to the final step (typically linking the object files into an executable). This is not what you want.

If you want a quick hack to support this idea, it may be easier to build each "independent target" a/b/c/d/e in its own _build_foo directory, and then merge the build directories together, concatenating the _digests files. If we are collectively lucky, this should result in a single _build that is a valid state to run the rest. Then f/g/h/i... is a bit trickier, because of course you want to reuse the existing build state of the bap middle part; I guess it shouldn't be too hard to do with links (hardlinks or symlinks) to expose the intermediary build caches.

This suggests a general solution for -independent-targets x y z. From a given build state _build_common

create separate _build_{x,y,z} directories with links that expose the content of _build_before inside them
build each target in parallel
merge all build results, and the _digests file, back in _build_common (no need to copy the files that are links into _build_common)

For the _digests file you need to copy it in each _build_{x,y,z}, and then compute the appropriate suffix before merging back...

If we want to make this cleaner/nicer, I would look into making it possible to call several ocamlbuild processes in parallel in the same build directory. It may not be too hard to tell them to look into several _digests file (to avoid having to stitch them back manually) as read-only, and have each write into independent _digests and _log outputs.

ivg commented 4 years ago

I see. I had a little bit different picture but after I dived into the code, I figured out, why my proposal is not that easy to implement. The solver is deeply intertwined with the cache and executor, so when we have a list of task that we know are independent, we still can't just tell the executor to run N solvers.

Theoretically, it could be updated, so that we can run N solvers in parallel, but this change would be too intrusive, may affect the performance on a single-core machine, and in general, will destabilize the codebase. The risks that I think, we shouldn't take at this time.

gasche commented 4 years ago

But your suggestion of building targets in independent build directories may still work, if you merge them afterwards. This can be scripted outside the tool itself for an experiment, and could be made an internal function if it proves to be conclusive. Have you considered trying it?

ivg commented 4 years ago

But your suggestion of building targets in independent build directories may still work, if you merge them afterwards. This can be scripted outside the tool itself for an experiment, and could be made an internal function if it proves to be conclusive. Have you considered trying it?

Yes, but in a sort of flawed experiment :) I used opam as the parallelization engine. To understand the experiment we need to understand how BAP is built. We have a configure script, which can select a specific component from the code base, e.g., ./configure --enable-monads builds only the Monads library, and nothing more. The generated oasis file has only this component and nothing more.

Next, we have an opam-repository that comprises all packages from the BAP universe. Each package

is basically doing ./configure --enable-<p> && make && make install. And we have a meta-package called bap which references all bap packages. Therefore, when I do opam install bap, opam analyzes the dependencies of all packages and builds them parallelizing each build individually on the package granularity (the same granularity that I wanted to use to parallelize ocamlbuild). So, opam was essentially doing what we wanted to do with separate _build folders, bur instead of merging cache, the dependencies were just installed.

Now the numbers. We have 88 packages containing 450 compilation units. The number of packages roughly corresponds to the number of libraries. A sequential build takes about 8 minutes to build all of them (that is 1 second per CU). I have 20 CPUs on my machine and 128 Gb of memory, so there was no CPU starvation. A parallel build with all 20 CPU utilized took about the same 8 minutes, mostly because opam itself took about 12 minutes of overhead but due to parallelization we were able to reduce the overall time in 2-2.5 times, so that we got back to 8 minutes instead of 18-20 minutes.

Takeaways. In application to BAP, this parallelization will give us no more than 2-2.5 improvement, and this is only if we will be able to implement merging without extra overhead. So, for me it is not practical, this is not really that big improvement (I can get a bigger compilation time improvements by switching to byte only compilation, which is a much easier task).

Note, the 2.5 improvement is not because opam wasn't good at parallelizing, it was often running 4 to 20 jobs so processors were well utilized. It is the granularity of tasks that prevented us from gaining optimal performance with good cache sharing. One of the problems was the main bap library bottleneck which basically takes more than 50% of the time to compile and is not parallelized.

I did another experiment, I've tried OMake instead, which does full program compilation and was able to build BAP (we have an option --enable-everything that builds all components out-of-the source) in one minute (1 minute 8 seconds), that is x8 improvement to our current build time, that is about 150 ms per compilation unit, or given the number of CU and libraries, it will give us 2000 calls to ocaml* utilities, which is about 3ms per utility. Quite impressive.

The grain of salt, is that (a) OASIS -> OMake integration has a few missing features and bugs and that OMake is sort of out of hype, which unfortunate, given how powerful it is and that I was able to switch to it (without any prior experience and with bugs in the integration code) in about 3 hours of work (99% was updating missing dependencies in our oasis specification).