souffle-lang / souffle

Soufflé is a variant of Datalog for tool designers crafting analyses in Horn clauses. Soufflé synthesizes a native parallel C++ program from a logic specification.
http://souffle-lang.github.io/
Universal Permissive License v1.0
892 stars 197 forks source link

Auto-Scheduling without Re-running the Query Optimizer #2247

Open SamArch27 opened 2 years ago

SamArch27 commented 2 years ago

Right now, the auto-scheduler can be used with the interpreter as follows:

souffle <program> -p <profile> --emit-statistics souffle <program> --auto-schedule=<profile>

The first command generates a profile with index selectivity statistics.

The second command reads the profile with statistics, runs the query optimizer to find schedules and runs with those schedules.

If we want to re-run the interpreter with the same schedules (say for rapid prototyping), the user will re-run the second command:

souffle <program> --auto-schedule=<profile>

But this will re-run the query optimizer redundantly, slowing down the rapid prototyping cycle.

To fix this, we want the auto-scheduler to cache the generated schedules.

We can do this by emitting plan statements and saving them to a .plan file.

Then when re-running with the same schedules, they can provide the .plan file as a command-line argument, and Soufflé will use those schedules.

I imagine it would be used as follows:

souffle <program> -p <profile> --emit-statistics souffle <program> --auto-schedule=<profile> --emit-schedules=<plan file> souffle <program> --use-schedule=<plan file>

b-scholz commented 2 years ago

We may also need some rule identifiers; also we need to reflect the new auto-scheduling work in the documentation.