sneeuwballen / benchpress

Testing and benchmarking tool for logic-related programs.
BSD 2-Clause "Simplified" License
26 stars 8 forks source link
benchmark-scripts ocaml smtlib test-runner tptp web-ui

Benchpress build

Tool to run one or more logic programs, on a set of files, and collect the results.

License: BSD.

Basic usage

$ benchpress run -c foo.sexp dir_a/ dir_b/ -p z3
…

this tells benchpress to run the prover z3 on directories dir_a and dir_b. foo.sexp contains additional configuration parameters as described below.

System dependencies

Logitest relies on a bunch of utilities, besides OCaml libraries:

TODO use cgroups or similar for CPU affinity

Options

CLI options

Most of the commands accept -c <config file> to specify which config files to use.

ENV var options

Some internal parameters of benchpress can be set using environment variables:

Web interface

Config File

Benchpress ships with a builtin config, which is imported by default unless (import-prelude false) is specified. It contains, roughly:

Builtin config

; read smtlib status
(prover
  (name smtlib-read-status)
  (cmd "grep :status $file")
  (unknown ":status unknown")
  (sat ":status sat")
  (unsat ":status unsat"))

(prover
  (name minisat)
  (unsat "UNSATISFIABLE")
  (sat "^SATISFIABLE")
  (cmd "minisat -cpu-lim=$timeout $file"))

(prover
  (name z3)
  (cmd "z3 $file")
  (version "cmd:z3 --version")
  (unsat "unsat")
  (sat "^sat"))

The configuration is based on stanzas that define available provers, available sets of benchmarks (based on directories that contain them), and tasks. For now the only kind of supported task is to run provers on problems, but it should get richer as we go (e.g. run proof checkers, do some basic CI, run a task daily, etc.).

In this default file we also define a pseudo-prover, "smtlib-read-status", which is used to parse SMTLIB benchmarks and find an annotation (set-info :status <…>). This is useful when running provers later because it makes it easy to find bugs (if a prover reports a wrong answer).

We also define provers minisat and z3 as common reference points, providing info on how to run them (with cmd …) and how to parse their results using regexes.

Example of config file

A more complete example, taken from mc2:


; from https://github.com/c-cube/mc2

(prover
  (name mc2)
  (cmd "ulimit -t $timeout; mc2 --time $timeout $file")
  (unsat "^Unsat")
  (sat "^Sat")
  (unknown "Unknown")
  (timeout "Timeout"))

(dir
  (path "$HOME/workspace/smtlib")
  (pattern ".*.smt2")
  (expect (run smtlib-read-status)))

(task
  (name glob-all-smtlib)
  (synopsis "run all SMT solvers on smtlib")
  (action
   (run_provers
    (dirs ("$HOME/workspace/smtlib"))
    (provers (mc2 z3))
    ;(memory 100000000)  ; TODO: parse "10G"
    (timeout 10))))

(task
  (name glob-all-smtlib-QF_UF)
  (synopsis "run all SMT solvers on QF_UF")
  (action
    (run_provers
      (dirs ("$HOME/workspace/smtlib/QF_UF"))
      (provers (mc2 z3))
      (timeout 10))))

Such a configuration file can be validated using:

$ benchpress check-config the_file.sexp

Then one can run a task, like so:

$ benchpress run -c the_file.sexp --task glob-all-smtlib-QF_UF -t 30

to run mc2 and z3 on the QF_UF problems in the SMTLIB directory. The task stanza defines a pre-packaged task that can be launched easily from the command line or the embedded web server (a bit like a makefile target).

Note that tasks are not necessary, they're just shortcuts. You can also pass directly the prover list and directory:

$ benchpress run -c the_file.sexp -p mc2 some/path/ -t 30

List of stanzas

The variable $cur_dir evaluates to the config file's directory. This allows the config file to refer to provers that are installed locally (e.g. in the same repository).

Actions

An example of a task running with slurm

(task
  (name testrun-slurm)
  (action
    (run_provers_slurm
      (dirs ($PATHS))
      (provers (z3 cvc4))
      (timeout 2)
      (j 4)
      (nodes 2)
      (addr "xxx.xxx.xx.xxx")
      (port 8080)
      (ntasks 20)
      )))

assuming that "PATHS" are paths to directories containing benchmarks which were previously defined in the config file.

Note: Running a task with slurm should be done on a control node. The nodes of the cluster should have at least a shared directory and "XDG_DATA_HOME" should be a path to it or one of its subdirectories. It will be used to store the config file which will be used by the benchpress worker instances which run on the compute nodes of the cluster.