souffle-lang / souffle

Soufflé is a variant of Datalog for tool designers crafting analyses in Horn clauses. Soufflé synthesizes a native parallel C++ program from a logic specification.
http://souffle-lang.github.io/
Universal Permissive License v1.0
930 stars 208 forks source link

RFC / Issue: Performance Bottlenecks #1406

Open ohamel-softwaresecure opened 4 years ago

ohamel-softwaresecure commented 4 years ago

Currently there are a number of performance bottlenecks in Souffle. The following is a partial list of places where we have unacceptable run-time complexity. These can be triggered when processing sufficiently large programs (e.g. 7k+ clauses).

General Issues:

Analyses aren't incremental. Right now any transform change invalidates all analyses. This is painful, since a tiny local change invalidates expensive program-wide analyses (e.g. type-checking). As a compromise perhaps it would be possible to have clause-level caching of some analyses?

Specific Issue Instances:

psubotic commented 4 years ago

Hi,

I really advocate clause-level caching. If I understood what you mean correctly, this would involve (perhaps with a directive "cached") specifying cached rule or relation and thus writing it to somewhere (say disk as a first go) when it is fully computed, so that it can be read in as an EDB in the next computation, if none of its dependencies (in the SCC graph) are changed.

I know David Zhao is working on more sophisticated incrementalism but the clause level would be a big help and something I've been advocating for. Thanks for bringing it up.

Cheers,

Paul

On Thu, Apr 9, 2020 at 11:17 PM ohamel-softwaresecure < notifications@github.com> wrote:

Currently there are a number of performance bottlenecks in Souffle. The following is a partial list of places where we have unacceptable run-time complexity. These can be triggered when processing sufficiently large programs (e.g. 7k+ clauses).

General Issues:

Analyses aren't incremental. Right now any transform change invalidates all analyses. This is painful, since a tiny local change invalidates expensive program-wide analyses (e.g. type-checking). As a compromise perhaps it would be possible to have clause-level caching of some analyses?

Specific Issue Instances:

-

RecursiveClauses: The implementation is absurdly naive and expensive (brute force recursion check). Proposed fix: Compute a proper SCC. Generalise RecursiveRelation into a AnalysisSCC and share impl. PR is in the works.

getLeastCommonSupertypes: Iterates over every type in the environment on each invocation. This is called every type system constraint solver iterator (if a constraint makes use of it). Hacky fix: This can be dramatically improved by adding a proper subtype DAG + walking upwards (O(min(m, n)) instead of O(|T|) where m, n is the subtype depths. In practice m and n tends to be very small. A PR can be made available. Proper fix: The type system is awful, this is known. Perhaps we can implement a HM inference solver rather than use a more general fixed point constraint solver?

AstQualifiedName::operator<: This is a moderately hot function since AstQualifiedName is used as a key in lookup maps. In these cases we don't need lexical ordering, any ordering will do (or a hash, if unordered_map). Proposed Fix: Add qualified name interning, use the interned ID instead for O(1) mapping. I've a patch that does global interning. Each AstQualifiedName memoizes their interned ID on request. PR for proposed fix can be made available.

visitDepthFirst: Nothing wrong with it, but in several cases they're used for exists-like computations and could bail out early.

ephemeral large vector copies: In the AST there are a number of places where instead of returning vec<ptr> const& we produce and return vec<T>&. AFAIK the motivation for this is to minimize coupling between who owns what in the AST. This isn't too bad for small vectors accessed infrequently, but does show up for us in cases such as AstProgram::getRelations and AstProgram::getClauses, and their clients (e.g. the O(n) one-shot helpers in AstUtils.h, ProvenanceTransformer, etc.). Proposal:*

  1. Return vec<ptr> const& for AstProgram. Generally speaking I'm not convinced the AST shuffles around that much, and it certainly won't for AstProgram since that's the root. 2a) Option 1) Kill off the helpers in AstUtils. The RelationDetailCache analysis assembles lookup tables. Other transforms should use it instead. 2b) Option 2) Kill off the helper transform and store AstRelation and AstClause in a by-name lookup multimap. (Effectively inline RelationDetailCache into AstProgram.) I'm leaning towards 2b. Can provide PR. Any other opinions?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/souffle-lang/souffle/issues/1406, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAC3OOIDR5PJYZ4TS7CNNI3RLY3QHANCNFSM4MFBCRWQ .

b-scholz commented 4 years ago

These are performance bugs in Souffle without executing the code (parsing / checking / analysing etc.). We did not have seriously large codebases so far. We saw some issues in GrammarTech's ddisasm project recently related to performance bugs in the semantic checker.