tweag / pirouette

Language-generic workbench for building static analysis
MIT License
47 stars 2 forks source link

speed up smt calls #154

Closed qaristote closed 1 year ago

qaristote commented 1 year ago

Why ?

Currently, Pirouette uses SMT solvers by calling them as external processes and communicating with them using OS-level channels. The language used to communicate with the solvers is SMTLib, which means the interaction goes as follows:

Some benchmarks verifying this intuition are available and detailed here. Overall, here are the key takeaways:

What ?

This PR thus replaces the internals of the PureSMT module to call Z3 bindings instead of running solvers as external processes. We only use the z3_eval_smtlib2_string binding (which has Z3 evaluate SMTLib commands) instead of using bindings to manually build the AST because the serializing and parsing of SMTLib commands is deemed quick enough that the work needed to use these finer-grained bindings would not be worth the speedup. We thus also have these SMTLib commands be given to the solver as bytestrings instead of higher-level strings.

Commit c79d8bb4a4beb9ad448c4cf97130ebbfd9a068be replaces the internals of PureSMT to use bindings instead of running external processes. This introduces a dependency to the haskell-z3 library which ports Z3's C bindings to Haskell. As the library does not export the z3_eval_smtlib2_string, we actually rely on a temporary fork that does.

This results in some of the PureSMT interface becoming useless as it was designed with running external processes in mind. Commit 64f18eb12229e7025596a59cb83ed07e03337fe0 cleans this interface.

As relying on a temporary fork of haskell-z3 is not ideal, we instead use the bindings inside inlined C code in commit 33977c2e05fd32213a8c804a0eaf1329d8b3d7ef.

What's next ?

Once the haskell-z3 fork is merged upstream it shouldn't be necessary to inline C code anymore. Ideally haskell-z3 should also be improved to allow using bytestrings in its higher-level modules Z3.Monad and Z3.Base, as currently only the lower-level Z3.Base.C allows it.

This PR makes Pirouette not solver-universal anymore: the use of Z3 is forced. Other solvers could be supported again given that they provide an API with a function similar to z3_eval_smtlib2_string, but as of writing this is not the case of CVC4 or CVC5 for example.

Interestingly, our benchmarks also showed that even Z3's C API is not as fast as Z3 itself, most likely because z3_eval_smtlib2_string does not wrap Z3's internals the same way Z3's frontend does. This could be an additional direction in which to look to further improve the running-time of the interaction with Z3 inside Pirouette.

After this PR, serializing and parsing SMTLib commands will take up 18% of Pirouette's running time on the biggest example of the test suite. If this were to be deemed enough to decide that the internals of PureSMT should build the AST themselves (fully using Z3's API) instead of relying on SMTLib, recovering solver-universality would require a Haskell library akin to smt-switch or metaSMT (which would likely be a lot of work).

0xd34df00d commented 1 year ago

Perhaps somebody else wants to look at this though before it gets merged.

qaristote commented 1 year ago

I would like the dependency on haskell-z3 to be removed before merging this.

I'm all for this but I've tried everything I could think of to compile the code without having haskell-z3 as a dependency and I always get this linking error:

GHC.Runtime.Linker.dynLoadObjs: Loading temp shared object failed
During interactive linking, GHCi couldn't find the following symbol:
  /tmp/ghc237807_0/libghc_3.so: undefined symbol: Z3_mk_config
This may be due to you not asking GHCi to load extra object files,
archives or DLLs needed by your current session.  Restart GHCi, specifying
the missing library using the -L/path/to/object/dir and -lmissinglibname
flags, or simply by naming the relevant files on the GHCi command line.
Alternatively, this link failure might indicate a bug in GHCi.
If you suspect the latter, please report this as a GHC bug:
  https://www.haskell.org/ghc/reportabug
facundominguez commented 1 year ago

I'll have a look.

facundominguez commented 1 year ago

Now the interpreter is able to find libz3.so if pirouette is built inside the nix shell, which sets LD_LIBRARY_PATH appropriately.

facundominguez commented 1 year ago

I contributed one minor optimization. cc6d9b5 causes inline-c to use unsafeUseAsCString to marshal the cmdText to C. This can be observed with -ddump-splices.