ocaml / dune

A composable build system for OCaml.
https://dune.build/
MIT License
1.64k stars 409 forks source link

[RFC] Distributed cache protocol. #3452

Closed mefyl closed 3 years ago

mefyl commented 4 years ago

This is a description, from the dune point of view, of the current status of the distributed caching effort. Distributed caching goal is to share artifacts between several cache daemon, so artifacts built on one machine are avaible to other. This mechanism is mostly implemented in the cache daemon, with dune sending only some agnostic information so the daemon can decide how to share and retrieve artifacts.

A first implementation of basic required messages on the dune side has been submitted in #3407 .

Vocabulary

Strategies

Several strategies have been suggested to implement such prefetching. At the dune retreat, the two main ones were :

Hints

Inform the daemon of rules we know are going to be run, so it can prefetch it.

The pro is that this is very simple to implement, and very deterministic : we will fetch exactly what's needed.

The cons is that we will know quite late the rules we are going to build quite late (all dependencies must be built), so if the build graph is narrow the local build could outpace the prefeteching process.

Weak hashes

To circumvent hints limitation, the idea of weak hashes was introduced : hashes that aggregate less information than the actual hashes, but can computed at initialization time. They would include the action hash, the path to the dependencies, etc.

Pros : Compared to hinting, prefetching can start right at the start of the build since we have all hashes right away.

Cons : Weak hashes are not injective, so several rule executions might have the same hash - for instance if a dynamic dependency has a different content depending on external parameters. This means we could have to download many rules from the DAR when we receive a weak hash, to make sure the one we actually want is included, and that weak hashes must be carefully thought to not aggregate hundreds of rules under one weak hash. Weak hashes are not surjective either, since they will give different hash to the same compilation if the path to the source differs for instance, but that is more of a marginal consideration.

Commit indexing

While working on the distributed cache, another idea surfaced. Dune already shares VCS information with the daemon, by sending at the beginning of any build all relevant repositories & commits the source files are part of, and associating ever rule with one of these -- this was actually so far left as a FIXME placeholders values in the protocol but was included in the original specifications. We can thus easily index artifacts by the commit they were built for, and reciprocally fetch all artifacts needed to build a commit as soon has dune gives us VCS information at the start of a build. We need to add some additional information to a commit, such as the OS/toolchain version.

Pros: Prefetch starts as early as with weak hashes (build start) and is much more deterministic. Cons: We need a way to separate builds by global build environment : OS, tools versions, or anything that could impact the build output, so that we don't aggregate disjoints builds of a commit.

Current implementations

The rule hinting has been implemented first given its simplicity. Results were mixed, with only marginal speed boosts, although no very big repository have been tested and the implementation on the daemon side has been kept quite naive. This could maybe be improved, but I left it as-is since IMO it should be entirely superseded by the Commit indexing approach.

The commit indexing approach was implemented, on the dune side mostly by filling the commit information placeholders, and gives great results (see bench in Annex). We certainly need to experiment more, but my hunch is that this approach will be by far the fastest and most consistent, and thus the only one needed.

Rule hinting

Whenever dune determines a rule will be run, it sends the hash of the rule to the cache daemon so it can potentially prefetch it from the DAR.

Implementation

The hint message can be sent from the build fibers, right before potentially locking the concurrency semaphore. Thus, if any "wide" part of the build graph is encountered, such as a library containing many cmo files, most of these rules will sent a hint message and then block. The cache daemon will then maybe have fetch some of these artifacts from the DAR by the time they unlock, making them run instantly.

Messages

Commit indexing

VCS information is sent by dune at the beginning of every build - eg. right after the protocol version negotation. These messages were already planned out in the original specification and just left blank so far, as they were not used by either side. When receiving such commit information, the cache daemon will immediately start prefetching artifacts associated with those commits from the DAR. It will also accumulate artifacts commit information any time an artifact is promoted. At the end of the build, all the rule hashes built as part of a commit will compiled and indexed to the DAR, enabling future retrieval.

The current implementation does not yet include additional information such as the build context, or take action if the build directory is dirty for instance.

Messages

Annex : sample time bench with commit indexing

This is a build of dune itself, with various local cache and DAR statuses.

# Build from scratch with cache disabled
$ rm -rf _build ~/.cache/dune/db/{files,meta}
$ time DUNE_CACHE=disabled dune build
time: 14.360 real  1:07.30 user  28.231 system (665%)

# Build from scratch with cache enabled and empty. This will populate
# both the local and distributed cache. We see a 2-3% performance drop
# due to the cache overhead.
#
# Note that this overhead is not (I think) due to dune being slowed
# down as we just send asynchronous network messages : it is because,
# after the build, we wait for the protocol completion in case the
# daemon send us some final "dedup" messages.
$ rm -rf _build ~/.cache/dune/db/{files,meta}
$ time DUNE_CACHE=enabled dune build
time: 14.673 real  1:07.79 user  28.422 system (655%)

# Remove the build dir AND the local cache. The build process will
# start and the local cache will be populated from the distributed
# cache in parallel in reverse. When the two process encounter,
# everything is pulled from the local cache and we see a 65%
# performance increase.
$ rm -rf _build ~/.cache/dune/db/{files,meta}
$ time DUNE_CACHE=enabled dune build
time: 5.042 real  5.346 user  3.896 system (183%)

# And of course, if we just remove the build tree, everything is
# pulled from the local cache and we see an 85% performance increase.
$ rm -rf _build
$ time DUNE_CACHE=enabled dune build
time: 2.338 real  2.063 user  0.266 system (99%)

# All times are of course order of magnitude and not statistically
# represeventative. Also, the setup time of dune should be deduced
# from all times to see actual build time speedup.
snowleopard commented 4 years ago

@mefyl Many thanks for writing this up!

During the Dune retreat I suggested an idea of using "over-approximating hashes", which is not described above, so here it is.

Perhaps, it got somehow mixed up with the "weak hashes" idea? What you call "weak hashes" seems to be the dual, i.e. "under-approximating hashes".

Over-approximating hashes

When some dependencies of a rule are dynamic, we can often compute an over-approximation of all dependencies. Let's say, a rule depends on some subset of .ml files in a directory and to compute the subset exactly we need to run something like ocamldep. In this case, an over-approximation of the dependencies would be to safely assume that the rule depends on all .ml files in the directory. We can then compute a hash of this over-approximation and send an early request to the distributed cache.

For this to work, we need to have a mapping from over-approximating hashes to accurate hashes, which can be obtained when populating the hash. Specifically, when adding an entry to the cache, we will provide both the accurate hash and some of its over-approximating versions, computed at the points when a rule was blocked on dynamic dependencies.

The more over-approximation we do, the more unlikely it is to hit the cache but if we have a hit then the result is definitely correct and we can immediately use it without waiting for the accurate hash to be computed.

mefyl commented 4 years ago

I think this is indeed the same think as the "weak hashes", it's just a matter as what we define as the "approximate things we put in this hash". I fear doing this would however greatly delay the point at which we can deduce the hash and hint the cache.

Right now my feeling is that nothing will beat the VCS approach, I think we should focus on this and only explore other possibility if it does not satisfy us - but as I posted early results are really promising.

snowleopard commented 4 years ago

I think this is indeed the same think as the "weak hashes", it's just a matter as what we define as the "approximate things we put in this hash".

Well, it doesn't have the main disadvantage you described: you only download what you definitely need. I think it's different enough to be added to the list of approaches to consider.

Of course, it's fine to prioritise one promising approach over the others. I'm pretty curious to see how the VCS approach works out in the end!