ocaml-bench / sandmark

A benchmark suite for the OCaml compiler
The Unlicense
82 stars 40 forks source link

Add a ZDD benchmark #442

Closed jberdine closed 1 year ago

jberdine commented 1 year ago

This patch adds a benchmark that consists of a simple implementation of ZDDs (Reduced Ordered Zero-Suppressed Binary Decision Diagrams) with a small client use that builds a dictionary from a file containing a word list and searches it using a simple query. This implementation is likely not a good way to solve the client's search problem, but it exercises the ZDD implementation.

The implementation uses Weak sets to "hash-cons" the decision diagram nodes, and uses Ephemerons to memoize the ZDD operations. This might be interesting as a self-contained benchmark for those features. This code is not quite a micro-benchmark, but exercises Weak sets and Ephemerons heavily in a way that may be representative of using them for hash-consing and memoization.

Before I figure out how to wire this up into the benchmark suite, would there be interest in having a benchmark like this? Cc @kayceesrk who was recently working on optimizing these features, and @bobot who might be able to comment on whether this benchmark may be similar to Frama-C's usage of these features.

The words.txt file is just to have a small input data file. It is from https://www-cs-faculty.stanford.edu/~knuth/sgb-words.txt. It may be too small to provide a meaningful benchmark, but I don't know if just using /usr/share/dict/words will work.

kayceesrk commented 1 year ago

Thanks @jberdine. Having a benchmark that heavily uses weak hash sets and ephemerons is a useful addition to Sandmark. There are still opportunities to optimise weak hash sets on OCaml 5 (https://github.com/ocaml/ocaml/issues/11662#issuecomment-1445037335), for which this benchmark will be useful.

Sudha247 commented 1 year ago

Thanks @jberdine! Agree with what KC said above, we don't have a lot of benchmarks for Weak & Ephemeron, and this is a nice addition.

Do you see this being added as a macro benchmark? (Macro benchmarks are included in nightly runs, results can be viewed on sandmark nightly - https://sandmark.tarides.com/). On my local machine, the benchmark runs for close to ~0.5s, it might be useful to make it slightly longer running if possible, for it to be added as a macro benchmark. Otherwise, you can add it to a locally run set of micro benchmarks.

jberdine commented 1 year ago

I don't know whether to aim for a micro or macro benchmark, what would be most helpful in terms of runtime of the benchmark? The runtime can be made as long as desired by changing the size of the input word list file.

Sudha247 commented 1 year ago

I'd think this can make for a good macro benchmark as it will be stress testing OCaml features not already included in the macro bench suite. I think we can aim for a running time of ~10s to ~30s or so.

jberdine commented 1 year ago

A potentially interesting aspect of this sort of use of ephemerons is that, as a (memoization) cache, it would be acceptable if the table constructed by e.g. Ephemeron.K2.Make was lossy. That is, it could be specified to allow nondeterministically empty elements even if they were still live. In practice for decision diagram implementations it has been found that such lossy tables offer higher performance even if the strict polynomial time guarantee of the operations is lost. A potential implementation strategy is to eliminate bucket lists entirely and simply overwrite an entry when an addition is made to the table that has a hash conflict with an existing element. (The heuristic to decide when to resize would probably need to be adjusted though.) It might be easier to add such a new lossy API and optimize it to a level where existing code could be moved to ocaml 5 without taking too large a performance hit. I'm unsure whether Frama-C would be able to use such a lossy API though, but the uses I have seen would be amenable to such a shift.

jberdine commented 1 year ago

I pushed a commit with a larger word list. I'm not sure how to predict the runtime of the benchmark on the test system, I can adjust either way if desired.

The word list is a selection (via awk 'NR % 5 == 0') of the BSD /usr/share/dict/words file, which comes from Webster's Second International dictionary whose 1934 copyright has expired.

Sudha247 commented 1 year ago

Great! This runs for ~12s on my machine, this is fine for macro bench as it is.

$ time _build/default/benchmarks/zdd/zdd.exe benchmarks/zdd/words.txt 
Unique table length: 300526 entries: 2265049 bucket lengths: sum: 2961669 min: 0 median: 7 max: 22
91

real    0m12.494s
user    0m12.149s
sys 0m0.344s

Are you able to add the benchmark run to run_config.json? There are some pointers for the same in the README, in case it's useful.

jberdine commented 1 year ago

I added an entry to run_config.json, but am not sure how to test this so I'm shooting from the hip here. Any advice?

jberdine commented 1 year ago

I added the dep but have not been able to test with run_all_serial.sh. It seems to assume a linux system and at the moment I have only a macos system to test this.

jberdine commented 1 year ago

For reference, for me this benchmark is ~60% slower on trunk than 4.14.1 (with flambda -O3 for both), and the stats printed about the hash set indicate that values referred to by weak pointers are living longer with trunk than 4.14.

bobot commented 1 year ago

@maroneze Would perhaps know better than me. Frama-C uses weak pointers for hash-consing, and Why3 uses ephemerons for caching. Both use cases seems covered by this benchmark. Thanks.

Sudha247 commented 1 year ago

Thanks for the updates! I kicked off a run locally and the benchmark produces results.