mthom / scryer-prolog

A modern Prolog implementation written mostly in Rust.
BSD 3-Clause "New" or "Revised" License
2.06k stars 123 forks source link

Improving compilation time of Scryer Prolog #2359

Open triska opened 8 months ago

triska commented 8 months ago

Today I came across an interesting article by @kobzol on Rust compilation times:

https://kobzol.github.io/rust/rustc/2024/03/15/rustc-what-takes-so-long.html

The topic of Scryer Prolog compilation times was at least in passing already mentioned a few times, for example in https://github.com/mthom/scryer-prolog/pull/2174#issuecomment-1830441779 and https://github.com/mthom/scryer-prolog/discussions/2339.

I hope the techniques mentioned in the article can provide a few more insights to everyone who wants to help with this issue, and maybe even help to improve compilation times in the future.

aarroyoc commented 8 months ago

I've managed to speed up the compilation by doing some of the stuff described in the links. As a tl;dr:

rustup component add rustc-codegen-cranelift-preview --toolchain nightly

To install the Cranelift codegen. It only works with the nightly compiler.

Edit Cargo.toml. This is the diff:

diff --git a/Cargo.toml b/Cargo.toml
index 2af52e25..690f4bc0 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -1,3 +1,4 @@
+cargo-features = ["codegen-backend"]
 [package]
 name = "scryer-prolog"
 version = "0.9.4"
@@ -132,6 +133,10 @@ opt-level = 3
 lto = true
 opt-level = 3

+[profile.dev]
+debug = 0
+codegen-backend = "cranelift"
+
 [[bench]]
 name = "run_criterion"
 harness = false

Now, to compile, use the nightly compiler:

cargo +nightly build

It should be faster than before

flexoron commented 8 months ago

Faster? How long does it take to upgrade rust every day and keep pace with those thousands of changes of crates sucked every day leading developers astray in the end every day? Rust users are swamped with futile work every day. Do you feel secure? Real issue: Programmers procrastinate real issues.

triska commented 5 months ago

Interesting news about this issue: https://blog.rust-lang.org/2024/05/17/enabling-rust-lld-on-linux.html

flexoron commented 5 months ago

Sorry @triska Just re-emphasize what I mean when saying: Programmers procrastinate real issues @aarroyoc, still confused?

$ rustc --version
rustc 1.78.0 (9b00956e5 2024-04-29)
$ cd ~/.cargo
$ rm -rf advisory-db
$ cd ~/.cargo/registry
$ rm -rf *
$ cd ~/tools/prolog
$ git clone https://github.com/mthom/scryer-prolog.git
$ cd scryer-prolog
$ cargo audit
    Fetching advisory database from `https://github.com/RustSec/advisory-db.git`
      Loaded 628 security advisories (from /home/nox/.cargo/advisory-db)
    Updating crates.io index
    Scanning Cargo.lock for vulnerabilities (396 crate dependencies)
Crate:     mio
Version:   0.7.14
Title:     Tokens for named pipes may be delivered after deregistration
Date:      2024-03-04
ID:        RUSTSEC-2024-0019
URL:       https://rustsec.org/advisories/RUSTSEC-2024-0019
Solution:  Upgrade to >=0.8.11
Dependency tree:
mio 0.7.14
├── signal-hook-mio 0.2.3
│   └── crossterm 0.20.0
│       └── scryer-prolog 0.9.4
└── crossterm 0.20.0

Crate:     mio
Version:   0.8.10
Title:     Tokens for named pipes may be delivered after deregistration
Date:      2024-03-04
ID:        RUSTSEC-2024-0019
URL:       https://rustsec.org/advisories/RUSTSEC-2024-0019
Solution:  Upgrade to >=0.8.11
Dependency tree:
mio 0.8.10
└── tokio 1.35.1
    ├── warp 0.3.5
    │   └── scryer-prolog 0.9.4
    ├── tokio-util 0.7.10
    │   ├── warp 0.3.5
    │   └── h2 0.3.26
    │       ├── reqwest 0.11.23
    │       │   └── scryer-prolog 0.9.4
    │       └── hyper 0.14.28
    │           ├── warp 0.3.5
    │           ├── reqwest 0.11.23
    │           └── hyper-tls 0.5.0
    │               └── reqwest 0.11.23
    ├── tokio-tungstenite 0.18.0
    │   └── warp 0.3.5
    ├── tokio-stream 0.1.14
    │   └── warp 0.3.5
    ├── tokio-rustls 0.23.4
    │   └── warp 0.3.5
    ├── tokio-native-tls 0.3.1
    │   ├── reqwest 0.11.23
    │   └── hyper-tls 0.5.0
    ├── scryer-prolog 0.9.4
    ├── reqwest 0.11.23
    ├── hyper-tls 0.5.0
    ├── hyper 0.14.28
    └── h2 0.3.26

Crate:     rustls
Version:   0.20.9
Title:     `rustls::ConnectionCommon::complete_io` could fall into an infinite loop based on network input
Date:      2024-04-19
ID:        RUSTSEC-2024-0336
URL:       https://rustsec.org/advisories/RUSTSEC-2024-0336
Severity:  7.5 (high)
Solution:  Upgrade to >=0.23.5 OR >=0.22.4, <0.23.0 OR >=0.21.11, <0.22.0
Dependency tree:
rustls 0.20.9
└── tokio-rustls 0.23.4
    └── warp 0.3.5
        └── scryer-prolog 0.9.4

Crate:     tungstenite
Version:   0.18.0
Title:     Tungstenite allows remote attackers to cause a denial of service
Date:      2023-09-25
ID:        RUSTSEC-2023-0065
URL:       https://rustsec.org/advisories/RUSTSEC-2023-0065
Severity:  7.5 (high)
Solution:  Upgrade to >=0.20.1
Dependency tree:
tungstenite 0.18.0
└── tokio-tungstenite 0.18.0
    └── warp 0.3.5
        └── scryer-prolog 0.9.4

Crate:     lexical
Version:   5.2.2
Warning:   unsound
Title:     Multiple soundness issues
Date:      2023-09-03
ID:        RUSTSEC-2023-0055
URL:       https://rustsec.org/advisories/RUSTSEC-2023-0055
Dependency tree:
lexical 5.2.2
└── scryer-prolog 0.9.4

error: 4 vulnerabilities found!
warning: 1 allowed warning found
$
triska commented 5 months ago

@flexoron: Could you please file security issues in separate issues for better overview and so that they can be more easily prioritized appropriately? Thank you a lot!

flexoron commented 5 months ago

For sure! Here is another article about finding why and improving how: https://fasterthanli.me/articles/why-is-my-rust-build-so-slow#linker-is-it-you

flexoron commented 5 months ago

Take-away from the article(s) above:

"We probably have an algorithm in rustc that performs reasonably well for most inputs, but our program somehow has a shape that makes that algorithm perform really poorly"

So, searching for improving compilation time makes sense only if working with a clean and up-to-date codebase.

Thats why I showed the audit report. Fixing security issues is just a side-effect then.

flexoron commented 5 months ago

@triska Do you have a simple pl-program for testing the performance of the 'improved compile-time' build? If these improvements degrade (scryer-prolog) performance then what?

flexoron commented 5 months ago
Some Results, Linker: mold, (Explanation: see Steps below)

cargo build
Finished `dev` profile [unoptimized + debuginfo] target(s) in 25.94s
?- testff(1000000).
   % CPU time: 436.145s, 86_000_034 inferences
3000000 findall/findall Elements   true.

cargo build --release
    Finished `release` profile [optimized] target(s) in 3m 55s
?- testff(1000000).
   % CPU time: 8.043s, 86_000_034 inferences
3000000 findall/findall Elements   true.

debug = 1
lto = "fat"
Finished `release` profile [optimized + debuginfo] target(s) in 4m 10s
Finished `release` profile [optimized + debuginfo] target(s) in 3m 37s
?- testff(1000000).
   % CPU time: 8.139s, 86_000_034 inferences

debug = 1
lto = "thin"
Finished `release` profile [optimized + debuginfo] target(s) in 3m 44s
Finished `release` profile [optimized + debuginfo] target(s) in 3m 08s
?- testff(1000000).
   % CPU time: 8.205s, 86_000_034 inferences

debug = 1
incremental = true
Finished `release` profile [optimized + debuginfo] target(s) in 2m 43s
Finished `release` profile [optimized + debuginfo] target(s) in 3.95s
?- testff(1000000).
   % CPU time: 8.661s, 86_000_034 inferences

debug = 1
lto = "thin"
incremental = true
codegen-units = 16
Finished `release` profile [optimized + debuginfo] target(s) in 3m 34s
Finished `release` profile [optimized + debuginfo] target(s) in 2m 18s
?- testff(1000000).
   % CPU time: 7.650s, 86_000_034 inferences

debug = 1
lto = "thin"
incremental = true
Finished `release` profile [optimized + debuginfo] target(s) in 3m 24s
Finished `release` profile [optimized + debuginfo] target(s) in 44.31s
?- testff(1000000).
   % CPU time: 7.834s, 86_000_034 inferences

debug = 0
lto = "thin"
incremental = true
Finished `release` profile [optimized] target(s) in 3m 16s
Finished `release` profile [optimized] target(s) in 40.78s
?- testff(1000000).
   % CPU time: 7.702s, 86_000_034 inferences

STEPS (CB = Cold Build, HB = Hot Build, PA = Prolog App)

1. Activate linker mold (Linux task) 
2. git clone https://github.com/mthom/scryer-prolog.git
3. Prepare CB:
   git reset --hard HEAD~2
   Changing ffi.rs manually for HB later
4. cargo build   
   Default dev build, CB:25s PA: 436s (PA:unusable)
5. cargo clean
6. cargo build --release
   Default release build, CB:4m PA:8s (PA:Boiled down, Great)

Now changing [profile.release] 

7. (debug = 1, lto = "fat")
   PA 8.139s
8. (debug = 1, lto = "thin")
   PA 8.205s No big difference between fat and thin
   HB useless

9. (debug = 1, incremental = true)
   PA 8.661s HB 3.95s 
             HB unbeatable fast

10.(debug = 1, lto = "thin", incremental = true, codegen-units = 16)
   PA 7.650s  HB 2m 
   PA faster, HB useless

11.(debug = 1, lto = "thin", incremental = true)
   PA 7.834s, HB 44s 
              HB acceptable

12.(debug = 0, lto = "thin", incremental = true)
   PA 7.702s, HB 40s
              HB acceptable

12.a.  Addition
$ find . -name '*.rs' -exec touch {} \;
$ cargo build --release
   Compiling typenum v1.17.0
   Compiling markup5ever v0.11.0
   Compiling scryer-prolog v0.9.4 (/home/nox/tools/prolog/scryer-prolog)
   Compiling mime_guess v2.0.4
   Compiling generic-array v0.14.7
   Compiling generic-array v0.12.4
   Compiling html5ever v0.26.0
   Compiling xml5ever v0.17.0
   Compiling digest v0.8.1
   Compiling block-buffer v0.7.3
   Compiling crypto-mac v0.7.0
   Compiling blake2 v0.8.1
   Compiling sha3 v0.8.2
   Compiling ripemd160 v0.8.0
   Compiling crypto-common v0.1.6
   Compiling block-buffer v0.10.4
   Compiling digest v0.10.7
   Compiling sha1 v0.10.6
   Compiling blake2 v0.10.6
   Compiling sha3 v0.10.8
   Compiling sha2 v0.10.8
   Compiling markup5ever_rcdom v0.2.0
   Compiling tungstenite v0.18.0
   Compiling headers v0.3.9
   Compiling select v0.6.0
   Compiling crrl v0.6.0
   Compiling tokio-tungstenite v0.18.0
   Compiling warp v0.3.5
    Finished `release` profile [optimized] target(s) in 46.00s
flexoron commented 5 months ago
Linker Test : cargo build --release  ( lto = true, opt-level = 3 )

-----------------------------------------
Four (system) linkers show no difference:

GNU ld version 2.41-37.fc40
GNU gold (version 2.41-37.fc40) 1.16
mold 2.32.0 (compatible with GNU ld)
LLD 18.1.6 (compatible with GNU linkers)

Finished `release` profile [optimized] target(s) in 3m 54s

----------------------

rust-lld

Finished `release` profile [optimized] target(s) in 4m 13s  # it's not the linker, 1.81.0-nightly is slower

readelf -p .comment target/release/scryer-prolog
String dump of section '.comment':
  [     0]  Linker: LLD 18.1.7
  [    42]  rustc version 1.81.0-nightly (a70b2ae57 2024-06-09)

Yes, nightly uses its own linker (LLD 18.1.7)

$ ld -v
mold 2.32.0 (compatible with GNU ld)
------------------------------------
flexoron commented 5 months ago

Now I try to improve the Cold Build (CB), our goal, but this attempt has some preconditions.

To begin with, the config file ~/.cargo/config.toml should not exist and secondly
the system linker lld must be active.

##################
Prereq
0. rustc --version
   rustc 1.78.0 (9b00956e5 2024-04-29)
1. mv ~/.cargo/config.toml ~/.cargo/config.toml.save
2. ld -v
   LLD 18.1.6 (compatible with GNU linkers) % Must be active!!!
3. Cargo.toml entry.
[profile.release]
debug = 0
incremental = true

##################

Build and test Prolog app performance (PA)

$ cargo clean
$ cargo build --release
Finished `release` profile [optimized] target(s) in 2m 31s  # CB, will be improved
?- testff(1000000).
   % CPU time: 8.728s                                       # PA, will be improved

Rebuilds so-called warm or hot builds (HB)
$ I changed src/ffi.rs back and forth to trigger the HB
Finished `release` profile [optimized] target(s) in 3.69s   # Unbeatable fast 
$ find . -name '*.rs' -exec touch {} \;                     # Just for fun touching *.rs
Finished `release` profile [optimized] target(s) in 8.15s   # Well, well!
$ find . -name '*' -exec touch {} \;                        # Touching everything
Finished `release` profile [optimized] target(s) in 21.84s  # So-so

##################

Now we need a ~/.cargo/config.toml

This put the linker into action and improves CB and PA

$ cat ~/.cargo/config.toml
[target.x86_64-unknown-linux-gnu]
linker = "clang"
rustflags = ["-C", "linker-plugin-lto"]

$
$ cargo clean
$ cargo build --release
Finished `release` profile [optimized] target(s) in 2m 06s  # CB, 30sec captured !!!
?- testff(1000000).
   % CPU time: 7.566s                                       # PA, 1sec (another surprise)

For HB we need an incremental linker which we don't have
Finished `release` profile [optimized] target(s) in 51.86s  # CB halved
Finished `release` profile [optimized] target(s) in 56.61s  # touch *.rs
Finished `release` profile [optimized] target(s) in 1m 17s  # touch all

##################

CB brought down to 2 minutes and surprise,surprise 
the speed of the prolog app can't be better.

Here is howto watch cargo in action:
Open a second terminal (resize from top to bottom of the screen)
Start:  
$ watch -n1 ps -T -t /dev/pts/0 # the tty you starting cargo :-)

Do you see the difference? It is not the linker, the opposite is true.

Condrastic measurement: lto everything
[profile.release]
lto = true
opt-level = 3

Cold Build
Finished `release` profile [optimized] target(s) in 3m 20s % 1min20sec slower, we don't like it
?- testff(1000000).
   % CPU time: 8.531s % PA slower, we don't like it, too