Open triska opened 8 months ago
I've managed to speed up the compilation by doing some of the stuff described in the links. As a tl;dr:
rustup component add rustc-codegen-cranelift-preview --toolchain nightly
To install the Cranelift codegen. It only works with the nightly compiler.
Edit Cargo.toml
. This is the diff:
diff --git a/Cargo.toml b/Cargo.toml
index 2af52e25..690f4bc0 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -1,3 +1,4 @@
+cargo-features = ["codegen-backend"]
[package]
name = "scryer-prolog"
version = "0.9.4"
@@ -132,6 +133,10 @@ opt-level = 3
lto = true
opt-level = 3
+[profile.dev]
+debug = 0
+codegen-backend = "cranelift"
+
[[bench]]
name = "run_criterion"
harness = false
Now, to compile, use the nightly compiler:
cargo +nightly build
It should be faster than before
Faster? How long does it take to upgrade rust every day and keep pace with those thousands of changes of crates sucked every day leading developers astray in the end every day? Rust users are swamped with futile work every day. Do you feel secure? Real issue: Programmers procrastinate real issues.
Interesting news about this issue: https://blog.rust-lang.org/2024/05/17/enabling-rust-lld-on-linux.html
Sorry @triska Just re-emphasize what I mean when saying: Programmers procrastinate real issues @aarroyoc, still confused?
$ rustc --version
rustc 1.78.0 (9b00956e5 2024-04-29)
$ cd ~/.cargo
$ rm -rf advisory-db
$ cd ~/.cargo/registry
$ rm -rf *
$ cd ~/tools/prolog
$ git clone https://github.com/mthom/scryer-prolog.git
$ cd scryer-prolog
$ cargo audit
Fetching advisory database from `https://github.com/RustSec/advisory-db.git`
Loaded 628 security advisories (from /home/nox/.cargo/advisory-db)
Updating crates.io index
Scanning Cargo.lock for vulnerabilities (396 crate dependencies)
Crate: mio
Version: 0.7.14
Title: Tokens for named pipes may be delivered after deregistration
Date: 2024-03-04
ID: RUSTSEC-2024-0019
URL: https://rustsec.org/advisories/RUSTSEC-2024-0019
Solution: Upgrade to >=0.8.11
Dependency tree:
mio 0.7.14
├── signal-hook-mio 0.2.3
│ └── crossterm 0.20.0
│ └── scryer-prolog 0.9.4
└── crossterm 0.20.0
Crate: mio
Version: 0.8.10
Title: Tokens for named pipes may be delivered after deregistration
Date: 2024-03-04
ID: RUSTSEC-2024-0019
URL: https://rustsec.org/advisories/RUSTSEC-2024-0019
Solution: Upgrade to >=0.8.11
Dependency tree:
mio 0.8.10
└── tokio 1.35.1
├── warp 0.3.5
│ └── scryer-prolog 0.9.4
├── tokio-util 0.7.10
│ ├── warp 0.3.5
│ └── h2 0.3.26
│ ├── reqwest 0.11.23
│ │ └── scryer-prolog 0.9.4
│ └── hyper 0.14.28
│ ├── warp 0.3.5
│ ├── reqwest 0.11.23
│ └── hyper-tls 0.5.0
│ └── reqwest 0.11.23
├── tokio-tungstenite 0.18.0
│ └── warp 0.3.5
├── tokio-stream 0.1.14
│ └── warp 0.3.5
├── tokio-rustls 0.23.4
│ └── warp 0.3.5
├── tokio-native-tls 0.3.1
│ ├── reqwest 0.11.23
│ └── hyper-tls 0.5.0
├── scryer-prolog 0.9.4
├── reqwest 0.11.23
├── hyper-tls 0.5.0
├── hyper 0.14.28
└── h2 0.3.26
Crate: rustls
Version: 0.20.9
Title: `rustls::ConnectionCommon::complete_io` could fall into an infinite loop based on network input
Date: 2024-04-19
ID: RUSTSEC-2024-0336
URL: https://rustsec.org/advisories/RUSTSEC-2024-0336
Severity: 7.5 (high)
Solution: Upgrade to >=0.23.5 OR >=0.22.4, <0.23.0 OR >=0.21.11, <0.22.0
Dependency tree:
rustls 0.20.9
└── tokio-rustls 0.23.4
└── warp 0.3.5
└── scryer-prolog 0.9.4
Crate: tungstenite
Version: 0.18.0
Title: Tungstenite allows remote attackers to cause a denial of service
Date: 2023-09-25
ID: RUSTSEC-2023-0065
URL: https://rustsec.org/advisories/RUSTSEC-2023-0065
Severity: 7.5 (high)
Solution: Upgrade to >=0.20.1
Dependency tree:
tungstenite 0.18.0
└── tokio-tungstenite 0.18.0
└── warp 0.3.5
└── scryer-prolog 0.9.4
Crate: lexical
Version: 5.2.2
Warning: unsound
Title: Multiple soundness issues
Date: 2023-09-03
ID: RUSTSEC-2023-0055
URL: https://rustsec.org/advisories/RUSTSEC-2023-0055
Dependency tree:
lexical 5.2.2
└── scryer-prolog 0.9.4
error: 4 vulnerabilities found!
warning: 1 allowed warning found
$
@flexoron: Could you please file security issues in separate issues for better overview and so that they can be more easily prioritized appropriately? Thank you a lot!
For sure! Here is another article about finding why and improving how: https://fasterthanli.me/articles/why-is-my-rust-build-so-slow#linker-is-it-you
Take-away from the article(s) above:
"We probably have an algorithm in rustc that performs reasonably well for most inputs, but our program somehow has a shape that makes that algorithm perform really poorly"
So, searching for improving compilation time makes sense only if working with a clean and up-to-date codebase.
Thats why I showed the audit report. Fixing security issues is just a side-effect then.
@triska Do you have a simple pl-program for testing the performance of the 'improved compile-time' build? If these improvements degrade (scryer-prolog) performance then what?
Some Results, Linker: mold, (Explanation: see Steps below)
cargo build
Finished `dev` profile [unoptimized + debuginfo] target(s) in 25.94s
?- testff(1000000).
% CPU time: 436.145s, 86_000_034 inferences
3000000 findall/findall Elements true.
cargo build --release
Finished `release` profile [optimized] target(s) in 3m 55s
?- testff(1000000).
% CPU time: 8.043s, 86_000_034 inferences
3000000 findall/findall Elements true.
debug = 1
lto = "fat"
Finished `release` profile [optimized + debuginfo] target(s) in 4m 10s
Finished `release` profile [optimized + debuginfo] target(s) in 3m 37s
?- testff(1000000).
% CPU time: 8.139s, 86_000_034 inferences
debug = 1
lto = "thin"
Finished `release` profile [optimized + debuginfo] target(s) in 3m 44s
Finished `release` profile [optimized + debuginfo] target(s) in 3m 08s
?- testff(1000000).
% CPU time: 8.205s, 86_000_034 inferences
debug = 1
incremental = true
Finished `release` profile [optimized + debuginfo] target(s) in 2m 43s
Finished `release` profile [optimized + debuginfo] target(s) in 3.95s
?- testff(1000000).
% CPU time: 8.661s, 86_000_034 inferences
debug = 1
lto = "thin"
incremental = true
codegen-units = 16
Finished `release` profile [optimized + debuginfo] target(s) in 3m 34s
Finished `release` profile [optimized + debuginfo] target(s) in 2m 18s
?- testff(1000000).
% CPU time: 7.650s, 86_000_034 inferences
debug = 1
lto = "thin"
incremental = true
Finished `release` profile [optimized + debuginfo] target(s) in 3m 24s
Finished `release` profile [optimized + debuginfo] target(s) in 44.31s
?- testff(1000000).
% CPU time: 7.834s, 86_000_034 inferences
debug = 0
lto = "thin"
incremental = true
Finished `release` profile [optimized] target(s) in 3m 16s
Finished `release` profile [optimized] target(s) in 40.78s
?- testff(1000000).
% CPU time: 7.702s, 86_000_034 inferences
STEPS (CB = Cold Build, HB = Hot Build, PA = Prolog App)
1. Activate linker mold (Linux task)
2. git clone https://github.com/mthom/scryer-prolog.git
3. Prepare CB:
git reset --hard HEAD~2
Changing ffi.rs manually for HB later
4. cargo build
Default dev build, CB:25s PA: 436s (PA:unusable)
5. cargo clean
6. cargo build --release
Default release build, CB:4m PA:8s (PA:Boiled down, Great)
Now changing [profile.release]
7. (debug = 1, lto = "fat")
PA 8.139s
8. (debug = 1, lto = "thin")
PA 8.205s No big difference between fat and thin
HB useless
9. (debug = 1, incremental = true)
PA 8.661s HB 3.95s
HB unbeatable fast
10.(debug = 1, lto = "thin", incremental = true, codegen-units = 16)
PA 7.650s HB 2m
PA faster, HB useless
11.(debug = 1, lto = "thin", incremental = true)
PA 7.834s, HB 44s
HB acceptable
12.(debug = 0, lto = "thin", incremental = true)
PA 7.702s, HB 40s
HB acceptable
12.a. Addition
$ find . -name '*.rs' -exec touch {} \;
$ cargo build --release
Compiling typenum v1.17.0
Compiling markup5ever v0.11.0
Compiling scryer-prolog v0.9.4 (/home/nox/tools/prolog/scryer-prolog)
Compiling mime_guess v2.0.4
Compiling generic-array v0.14.7
Compiling generic-array v0.12.4
Compiling html5ever v0.26.0
Compiling xml5ever v0.17.0
Compiling digest v0.8.1
Compiling block-buffer v0.7.3
Compiling crypto-mac v0.7.0
Compiling blake2 v0.8.1
Compiling sha3 v0.8.2
Compiling ripemd160 v0.8.0
Compiling crypto-common v0.1.6
Compiling block-buffer v0.10.4
Compiling digest v0.10.7
Compiling sha1 v0.10.6
Compiling blake2 v0.10.6
Compiling sha3 v0.10.8
Compiling sha2 v0.10.8
Compiling markup5ever_rcdom v0.2.0
Compiling tungstenite v0.18.0
Compiling headers v0.3.9
Compiling select v0.6.0
Compiling crrl v0.6.0
Compiling tokio-tungstenite v0.18.0
Compiling warp v0.3.5
Finished `release` profile [optimized] target(s) in 46.00s
Linker Test : cargo build --release ( lto = true, opt-level = 3 )
-----------------------------------------
Four (system) linkers show no difference:
GNU ld version 2.41-37.fc40
GNU gold (version 2.41-37.fc40) 1.16
mold 2.32.0 (compatible with GNU ld)
LLD 18.1.6 (compatible with GNU linkers)
Finished `release` profile [optimized] target(s) in 3m 54s
----------------------
rust-lld
Finished `release` profile [optimized] target(s) in 4m 13s # it's not the linker, 1.81.0-nightly is slower
readelf -p .comment target/release/scryer-prolog
String dump of section '.comment':
[ 0] Linker: LLD 18.1.7
[ 42] rustc version 1.81.0-nightly (a70b2ae57 2024-06-09)
Yes, nightly uses its own linker (LLD 18.1.7)
$ ld -v
mold 2.32.0 (compatible with GNU ld)
------------------------------------
Now I try to improve the Cold Build (CB), our goal, but this attempt has some preconditions.
To begin with, the config file ~/.cargo/config.toml should not exist and secondly
the system linker lld must be active.
##################
Prereq
0. rustc --version
rustc 1.78.0 (9b00956e5 2024-04-29)
1. mv ~/.cargo/config.toml ~/.cargo/config.toml.save
2. ld -v
LLD 18.1.6 (compatible with GNU linkers) % Must be active!!!
3. Cargo.toml entry.
[profile.release]
debug = 0
incremental = true
##################
Build and test Prolog app performance (PA)
$ cargo clean
$ cargo build --release
Finished `release` profile [optimized] target(s) in 2m 31s # CB, will be improved
?- testff(1000000).
% CPU time: 8.728s # PA, will be improved
Rebuilds so-called warm or hot builds (HB)
$ I changed src/ffi.rs back and forth to trigger the HB
Finished `release` profile [optimized] target(s) in 3.69s # Unbeatable fast
$ find . -name '*.rs' -exec touch {} \; # Just for fun touching *.rs
Finished `release` profile [optimized] target(s) in 8.15s # Well, well!
$ find . -name '*' -exec touch {} \; # Touching everything
Finished `release` profile [optimized] target(s) in 21.84s # So-so
##################
Now we need a ~/.cargo/config.toml
This put the linker into action and improves CB and PA
$ cat ~/.cargo/config.toml
[target.x86_64-unknown-linux-gnu]
linker = "clang"
rustflags = ["-C", "linker-plugin-lto"]
$
$ cargo clean
$ cargo build --release
Finished `release` profile [optimized] target(s) in 2m 06s # CB, 30sec captured !!!
?- testff(1000000).
% CPU time: 7.566s # PA, 1sec (another surprise)
For HB we need an incremental linker which we don't have
Finished `release` profile [optimized] target(s) in 51.86s # CB halved
Finished `release` profile [optimized] target(s) in 56.61s # touch *.rs
Finished `release` profile [optimized] target(s) in 1m 17s # touch all
##################
CB brought down to 2 minutes and surprise,surprise
the speed of the prolog app can't be better.
Here is howto watch cargo in action:
Open a second terminal (resize from top to bottom of the screen)
Start:
$ watch -n1 ps -T -t /dev/pts/0 # the tty you starting cargo :-)
Do you see the difference? It is not the linker, the opposite is true.
Condrastic measurement: lto everything
[profile.release]
lto = true
opt-level = 3
Cold Build
Finished `release` profile [optimized] target(s) in 3m 20s % 1min20sec slower, we don't like it
?- testff(1000000).
% CPU time: 8.531s % PA slower, we don't like it, too
Today I came across an interesting article by @kobzol on Rust compilation times:
https://kobzol.github.io/rust/rustc/2024/03/15/rustc-what-takes-so-long.html
The topic of Scryer Prolog compilation times was at least in passing already mentioned a few times, for example in https://github.com/mthom/scryer-prolog/pull/2174#issuecomment-1830441779 and https://github.com/mthom/scryer-prolog/discussions/2339.
I hope the techniques mentioned in the article can provide a few more insights to everyone who wants to help with this issue, and maybe even help to improve compilation times in the future.