Open zamazan4ik opened 1 month ago
Hi, @zamazan4ik. Thank you for commenting it.
Found also interesting article on topic: https://tech.dreamleaves.org/trimming-down-a-rust-binary-in-half/
And I did some tests here:
lto = true
> docker build --no-cache -t ogp .
=> [builder 2/4] RUN /scripts/build cook 160.9s
=> [builder 4/4] RUN /scripts/build final ogp 59.7s
# = 220s
> docker build -t ogp .
=> CACHED [builder 2/4] RUN /scripts/build cook 0.0s
=> [builder 4/4] RUN /scripts/build final ogp 61.2s
-rwxr-xr-x 1 root root 6.4M Oct 29 19:17 ogp
lto = false
> docker build --no-cache -t ogp .
=> [builder 2/4] RUN /scripts/build cook 224.9s
=> [builder 4/4] RUN /scripts/build final ogp 10.5s
# = 234s
> docker build -t ogp .
=> CACHED [builder 2/4] RUN /scripts/build cook 0.0s
=> [builder 4/4] RUN /scripts/build final ogp 11.0s
ls -lah | grep ogp
-rwxr-xr-x 1 root root 7.4M Oct 29 17:59 ogp
lto = true
> wrk -t4 -c500 -d30s 'http://localhost:8080/health'
4 threads and 500 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 4.24ms 2.34ms 56.16ms 94.80%
Req/Sec 14.78k 3.15k 21.86k 62.17%
1765793 requests in 30.04s, 207.13MB read
Socket errors: connect 253, read 102, write 0, timeout 0
Requests/sec: 58789.28
Transfer/sec: 6.90MB
> wrk -t4 -c500 -d30s 'http://localhost:8080/v0/svg?title=&author=&photo=http://localhost:8080/assets/favicon.svg&url=&theme=default'
4 threads and 500 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 5.80ms 2.23ms 60.54ms 80.84%
Req/Sec 10.69k 1.76k 14.47k 63.50%
1277076 requests in 30.03s, 1.30GB read
Socket errors: connect 253, read 104, write 0, timeout 0
Requests/sec: 42527.60
Transfer/sec: 44.29MB
> wrk -t4 -c500 -d30s 'http://localhost:8080/v0/png?title=&author=&photo=http://localhost:8080/assets/favicon.svg&url=&theme=default'
4 threads and 500 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.25s 228.45ms 1.98s 75.75%
Req/Sec 48.37 20.66 128.00 71.38%
5790 requests in 30.09s, 515.68MB read
Socket errors: connect 253, read 154, write 0, timeout 0
Requests/sec: 192.43
Transfer/sec: 17.14MB
lto = false
> wrk -t4 -c500 -d30s 'http://localhost:8080/health'
4 threads and 500 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 4.36ms 2.60ms 124.67ms 95.54%
Req/Sec 14.40k 2.08k 19.10k 74.25%
1720145 requests in 30.03s, 201.78MB read
Socket errors: connect 253, read 113, write 0, timeout 0
Requests/sec: 57287.87
Transfer/sec: 6.72MB
> wrk -t4 -c500 -d30s 'http://localhost:8080/v0/svg?title=&author=&photo=http://localhost:8080/assets/favicon.svg&url=&theme=default'
4 threads and 500 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 6.09ms 2.47ms 59.52ms 82.47%
Req/Sec 10.19k 1.62k 14.74k 69.67%
1217635 requests in 30.03s, 1.24GB read
Socket errors: connect 253, read 80, write 0, timeout 0
Requests/sec: 40552.17
Transfer/sec: 42.23MB
> wrk -t4 -c500 -d30s 'http://localhost:8080/v0/png?title=&author=&photo=http://localhost:8080/assets/favicon.svg&url=&theme=default'
4 threads and 500 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.25s 240.14ms 1.98s 75.39%
Req/Sec 48.22 21.55 120.00 67.28%
5779 requests in 30.09s, 514.70MB read
Socket errors: connect 253, read 98, write 0, timeout 1
Requests/sec: 192.04
Transfer/sec: 17.10MB
lto = true
wrk -t4 -c500 -d30s 'http://localhost:8080/health'
Requests/sec: 42151.69
Transfer/sec: 4.94MB
wrk -t4 -c500 -d30s 'http://localhost:8080/v0/svg?title=&author=&photo=http://localhost:8080/assets/favicon.svg&url=&theme=default'
Requests/sec: 14038.25
Transfer/sec: 14.62MB
wrk -t4 -c500 -d30s 'http://localhost:8080/v0/png?title=&author=&photo=http://localhost:8080/assets/favicon.svg&url=&theme=default'
Requests/sec: 108.16
Transfer/sec: 9.63MB
lto = false
wrk -t4 -c500 -d30s 'http://localhost:8080/health'
Requests/sec: 46748.78
Transfer/sec: 5.48MB
wrk -t4 -c500 -d30s 'http://localhost:8080/v0/svg?title=&author=&photo=http://localhost:8080/assets/favicon.svg&url=&theme=default'
Requests/sec: 12905.99
Transfer/sec: 13.44MB
wrk -t4 -c500 -d30s 'http://localhost:8080/v0/png?title=&author=&photo=http://localhost:8080/assets/favicon.svg&url=&theme=default'
Requests/sec: 106.59
Transfer/sec: 9.49MB
The speed for ping request is kind of same, SVG generation works ~5% faster in lto=true, PNG rendering actually same time. CI time for lto is 6x longer.
It doesn't look to me that it makes sense for this program to enable lto by default in the supplied Dockerfile.
Btw, I added CARGO_PROFILE_RELEASE_LTO
build arg to Dockerfile, can be used like:
docker build -t ogp --build-arg CARGO_PROFILE_RELEASE_LTO=true .
Thank you a lot for the tests!
The speed for ping request is kind of same, SVG generation works ~5% faster in lto=true, PNG rendering actually same time. CI time for lto is 6x longer. It doesn't look to me that it makes sense for this program to enable lto by default in the supplied Dockerfile.
Yeah, if such a build-time overhead is important, I don't think that it makes a huge sense to enable LTO in the default Release profile. As a possible mitigation, we can create a dedicated heavy-release
profile with LTO enabled in it. If users choose to exchange more build time for a slightly faster binary - they will be able to do it via choosing just this profile without manual LTO enabling. In the future, we can put other heavy optimization to this profile.
Anyway, your way with CARGO_PROFILE_RELEASE_LTO
is also a viable option if it's visible to users.
Hi!
I noticed that in the
Cargo.toml
file Link-Time Optimization (LTO) for the project is not enabled. I suggest switching it on since it will reduce the binary size (always a good thing to have) and will likely improve the application's performance a bit.I suggest enabling LTO only for the Release builds so as not to sacrifice the developers' experience while working on the project since LTO consumes an additional amount of time to finish the compilation routine. If you think that a regular Release build should not be affected by such a change as well, then I suggest adding an additional
dist
orrelease-lto
profile where additionally to regularrelease
optimizations LTO will also be added. Such a change simplifies life for maintainers and others interested in the project persons who want to build the most performant version of the application. Using ThinLTO should also help to reduce the build-time overhead with LTO. E.g., checkcargo-outdated
Release profile.Basically, it can be enabled with the following lines:
I have made quick tests (Fedora 40) by adding
lto = true
to the Release profile. The binary size reduction is from 9 Mib to 8 Mib. You may also be interested in tweaking other options likecodegen-units
, etc.Thank you.