souffle-lang / souffle

Soufflé is a variant of Datalog for tool designers crafting analyses in Horn clauses. Soufflé synthesizes a native parallel C++ program from a logic specification.
http://souffle-lang.github.io/
Universal Permissive License v1.0
930 stars 208 forks source link

Generated C++ compile time regression #2303

Open adamjseitz opened 2 years ago

adamjseitz commented 2 years ago

The latest commit of Souffle, 66b96fd07e7124d0e46bcf7cdc1cc5bf61f54cc3, generates C++ code that takes substantially longer to build than previous Souffle versions. This regression appears to be introduced by https://github.com/souffle-lang/souffle/pull/2215. (As a note, we are excited about the new feature implemented in this PR and want to incorporate it in ddisasm's build system, but discovered this issue while evaluating doing so).

I am seeing this when trying to build the ddisasm project with the latest Souffle, on Ubuntu 20.04 with clang++.

$ clang++ --version
clang version 10.0.0-4ubuntu1
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

My hardware is a 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz CPU and 32 GB of RAM.

Here are steps to replicate it:

git clone https://github.com/GrammaTech/ddisasm.git
cd ddisasm
mkdir -p build/src/
souffle src/datalog/main.dl -g build/src/souffle_disasm_arm32.cpp -MARCH_ARM32
time clang++ -O3 -fvisibility=hidden -fopenmp=libgomp -Wno-parentheses-equality -Wno-unused-parameter -std=c++17 -fcolor-diagnostics -DDDISASM_ARM32 -DRAM_DOMAIN_SIZE=64 -D__EMBEDDED_SOUFFLE__ -DNDEBUG -c -o build/src/souffle_disasm_arm_32.o build/src/souffle_disasm_arm32.cpp

With Souffle 2.3 (the most recent release), as well as commit 62e44313d88dca2d06af8ba8989e82005f497248 (just before https://github.com/souffle-lang/souffle/pull/2215), build times are approximately:

real    4m4.334s
user    4m3.177s
sys     0m1.000s

With souffle 66b96fd07e7124d0e46bcf7cdc1cc5bf61f54cc3 (latest master) and 429e1681ce97c8a792f1e4577bf2e9001835daa6 (the merge commit for https://github.com/souffle-lang/souffle/pull/2215):

real    27m18.063s
user    27m16.049s
sys     0m1.920s

The ARM32 variant of ddisasm's datalog code, as built above, exhibits a particularly bad case of this regression (4 minutes -> 27 minutes) but other variants also exhibited it to a lesser degree (e.g., ARM64: 3 minutes -> 6 minutes).

The compile time difference is not as extreme using g++, either. The ARM32 variant increases from 5m3s to 6m53s using g++ 9.4.0.

I am going to see if I can replicate this kind of behavior on a more minimal example instead of the complete ddisasm source, but I am not sure that's going to be easy. I at least wanted to report these numbers to illustrate the scale of the problem for a large project.

adamjseitz commented 2 years ago

Here's the complete matrix of build times from ddisasm's variants with different compilers:

Compiler Arch 2.3 master Change
g++ IA32 3:30.60 4:35.08 +30.6%
g++ AMD64 3:39.65 4:38.03 +26.6%
g++ ARM32 5:02.96 6:53.59 +36.5%
g++ ARM64 3:39.90 4:41.73 +25.1%
g++ MIPS32 3:46.65 4:57.51 +31.3%
clang++ IA32 2:54.31 5:47.30 +99.2%
clang++ AMD64 3:01.58 5:53.84 +94.9%
clang++ ARM32 4:04.33 27:18.06 +570.4%
clang++ ARM64 3:39.90 5:53.49 +60.8%
clang++ MIPS32 3:02.17 5:52.07 +93.3%

g++ builds suffered a somewhat consistent 30% increase, while clang++ builds ranged anywhere from 60% to 600% percent slower.

Additionally, the number of lines in the generated C++ source for each arch:

Arch 2.3 master Change
IA32 127930 148364 +13.7%
AMD64 137674 158020 +14.8%
ARM32 191274 214232 +12.0%
ARM64 130350 151126 +15.9%
MIPS32 133456 155077 +16.2%

It seems like across all five arch variants, about 20k lines of C++ were added.

b-scholz commented 2 years ago

Thanks for the detailed analysis! We must understand the extra work the compiler does with the new extension (compiling strata separately).

Perhaps you can perform diffs before and after so that we can find out what may cause the slow-down.

ruricolist commented 2 years ago

Git bisect pinpoints the slowdown in compilation time as occuring at 7d648a161.

Souffle output at f8f24c990 (previous commit): https://gist.github.com/ruricolist/4042ff207396eb704d91c40eca4bf16a Souffle output (for ARM) at 7d648a161: https://gist.github.com/ruricolist/a0b212440935ea2fc238963af0f2c4b3

ruricolist commented 2 years ago

Running clang++ with -ftime-trace, it appears the extra time is being spent in OptFunction passes on a single function, souffle::Subroutine_93::run.

The surface differences between the two versions of the function are minimal (ordering of conjunctions in conditionals), and manually splicing the body of the function from the "fast" version into the "slow" version does not improve compile times.