Add solver parallelism support

montyly commented 3 years ago

dev-evm-experiments contain experimental support for solver parallelism.

The solution I followed was to create a ALLSolver object inheriting from SMTLIBSolver, which use a Smtlib2Proc instead of SmtlibProc

Smtlib2Proc follows SmtlibProc, but:

Keep a list of _procs, which will contain a list of solvers processes.
When a command is sent, it will be sent to all the solvers alive
When recv is called, it is a bit more complex:

https://github.com/trailofbits/manticore/blob/f0d59ce0021994b9a9b6ea3df3ccfb4719426aad/manticore/core/smtlib/solver.py#L450-L518

The main idea is to wrap the solvers to a Process (from multiprocessing.context) and then loop until one of the solver replies.

Once a solver replies, all the other processes that did not reply will be killed. If no solver found a solution after 2.5minutes, the process stops.

Once a process is killed, we restart a new process, and we send back the formulas. One trick here is to NOT send check-sat on solvers that did timeout https://github.com/trailofbits/manticore/blob/f0d59ce0021994b9a9b6ea3df3ccfb4719426aad/manticore/core/smtlib/solver.py#L423-L427

The reasoning behind the choice is that we have a lot of situation where we loop over the solver results, like

- build path
- check-sat
- get-value A
- assert A != last value returned
- check-sat
- get-value A

In this situation, we do not want to send again the first check-sat, but the solver might performs better once assert A != last value returned is added.

We also probably want a way to select some solvers that are going to be run in parallel. Currently, the user uses --smt.solver all, but we should support something like --smt.solver boolector,yices

Overall the code in dev-evm-experiments is not clean, nor robust enough. It has a lot of hardcoded values/todo and need to be addressed

ggrieco-tob commented 3 years ago

pysmt has a similar feature, which is implemented using multiprocessing, but in particular, with Pipes:

https://github.com/pysmt/pysmt/blob/master/pysmt/solvers/portfolio.py

Any comments on this approach? (perhaps why it was not used?)

montyly commented 3 years ago

In favor of implementing this ourselves:

It's not so difficult to implement
Interfacing with pysmt might require some refactoring in our codebase, which defeats the purpose
Relying on an external dependency can slow us down

But on the other hand, pysmt can be a good solution if:

It turns out it's difficult to implement this feature (which I don't think it is seeing my initial test in dev-evm-experiments)
pysmt provides optimizations that we don't have, and we won't have in the short term
it provides a better framework to manipulate our formulas

I don't have a strong opinion, but I would be in favor of not adding another dependency here, unless we have strong benefits

ggrieco-tob commented 3 years ago

Just to be clear, I'm not suggesting using pysmt, but re-using their approach using Pipe (and probably some code if the licenses are compatible).

ggrieco-tob commented 3 years ago

@ekilmer himself worked on the Portfolio class for pysmt, so perhaps he has something to recommend :smiley:

ekilmer commented 3 years ago

I was only playing around with the portfolio feature for some light experimentation and found an inconsistency that I was able to pretty quickly fix.

I didn't dig too deep into the code itself, but it's a relatively short file with what looks to be a good and working design using standard Python libraries, so I think implementing something similar shouldn't be too hard 🙂

Would be a good excuse to look more into Python handling of concurrency.

trailofbits / manticore

Add solver parallelism support #2404