rust-lang / log

Logging implementation for Rust
https://docs.rs/log
Apache License 2.0
2.16k stars 250 forks source link

Binary size optimization experiment #569

Closed EFanZh closed 2 months ago

EFanZh commented 1 year ago

Experiment for some ideas I have on binary size optimization.

NobodyXu commented 1 year ago

Is this PR trying to take advantages of https://github.com/rust-lang/rust/pull/109999/ ?

EFanZh commented 1 year ago

Is this PR trying to take advantages of rust-lang/rust#109999 ?

Not exactly. This PR intends to minimize code generated by macro expansion. This is done by moving as much code logic as possible into functions that can be shared by each macro call.

NobodyXu commented 1 year ago

Is this PR trying to take advantages of rust-lang/rust#109999 ?

Not exactly. This PR intends to minimize code generated by macro expansion. This is done by moving as much code logic as possible into functions that can be shared by each macro call.

Thanks for explanation.

EFanZh commented 1 year ago

Note that we use to have a special case for literals: #366, but it was removed in 2d1a477.

Yes, I’m aware of that, I am trying to bring that optimization back.

EFanZh commented 1 year ago

It seems that using "opt-level=z", the compiler won’t inline Arguments::as_str if there are multiple calls to Arguments::as_str: https://godbolt.org/z/rzsG3fYEK. So both None and Some branches are kept in the final binary, which will cause binary size to grow.

Maybe marking Arguments::as_str as #[inline(always)] could help, but for now, I think I’ll have to give up specializing literal arguments.

NobodyXu commented 1 year ago

Maybe marking Arguments::as_str as #[inline(always)] could help, but for now, I think I’ll have to give up specializing literal arguments.

You could just pass &fmt::Argument<'_> instead of pass it by-value, it should remove the generated code for copying it.

EFanZh commented 1 year ago

@NobodyXu: I have tested the effect of passing &Arguments, and the result is not much different than passing Arguments directly. Maybe it’s because Rust already passes large object as references: https://godbolt.org/z/Wz35xYrsr. You can see in the example, passing Arguments and passing &Arguments generate the same assembly code.

NobodyXu commented 1 year ago

@NobodyXu: I have tested the effect of passing &Arguments, and the result is not much different than passing Arguments directly. Maybe it’s because Rust already passes large object as references: https://godbolt.org/z/Wz35xYrsr. You can see in the example, passing Arguments and passing &Arguments generate the same assembly code.

Your linked code does not actually use the Arguments, so the compiler might omits the copy requires to happen.

In your PR, you passes the Arguments to log2, log1, log0 by value, that would require copying Arguments between stacks unless inlined.

EFanZh commented 1 year ago

@NobodyXu: I have tested the effect of passing &Arguments, and the result is not much different than passing Arguments directly. Maybe it’s because Rust already passes large object as references: https://godbolt.org/z/Wz35xYrsr. You can see in the example, passing Arguments and passing &Arguments generate the same assembly code.

Your linked code does not actually use the Arguments, so the compiler might omits the copy requires to happen.

In your PR, you passes the Arguments to log2, log1, log0 by value, that would require copying Arguments between stacks unless inlined.

log0, log1, log can only have a fixed number of instances, which means they can only contribute to at most a constant size to the final binary, while the majority binary size comes from codes generated by macro calls, since there could be a lot of log calls for a large project. So optimizing function implementations should not matter too much. But still, I will compare different ways to pass arguments, and choose the best one.

EFanZh commented 1 year ago

OK, I think this is ready for the next step. This PR is just a proof of concept of some ideas I have, and should not be merged as is. I have tested this PR on my personal project and gained about 2.4% binary size reduction, which is significant to me. Now I want to implement these ideas in the log crate. Since this PR contains several optimization ideas, and I am not certain that all of them have positive effect on the binary size, so the best way maybe to discuss each idea, and decide whether it is reasonable. Finally, we may implement and merge each acceptable idea individually.

  1. Pass &str instead of Arguments if the message is a string literal.
    • I used a const function to check whether the format string can be treated as a string literal. And the condition is whether it contains '{' or '}' character, see the is_literal function.
  2. If log level is known at compile time, pass it statically.
  3. If no key-value pair is provided, pass the information statically.
  4. If target is the same as module path, pass the information statically.
  5. Pass module path and file as &'static (&'static str, &'static str), so that logs from a single file can share the same argument.
  6. If the kvs and format arguments do not have side effects, move the level filter check inside the function. (Needs further investigation, not implemented in this PR)
  7. If the final implementation is done using generic functions like this PR, make sure all generic function instances are instantiated in this crate (the unused function in this PR), so that downstream crates can utilize them by enabling share-generics.

Also, is there a known open source project that uses log extensively, so I can test the effect of my optimization on it? I only have a private project to test. I think my optimization will need a reproducible result.

NobodyXu commented 1 year ago

Also, is there a known open source project that uses log extensively, so I can test the effect of my optimization on it? I only have a private project to test. I think my optimization will need a reproducible result.

In cargo-binstall, log is a dependency that is heavily used:

``` log v0.4.19 ├── async_zip v0.0.15 │ └── binstalk-downloader v0.6.0 (/Users/nobodyxu/Dev/cargo-binstall/crates/binstalk-downloader) │ └── binstalk v0.13.0 (/Users/nobodyxu/Dev/cargo-binstall/crates/binstalk) │ └── cargo-binstall v1.0.0 (/Users/nobodyxu/Dev/cargo-binstall/crates/bin) ├── cargo-binstall v1.0.0 (/Users/nobodyxu/Dev/cargo-binstall/crates/bin) ├── gix v0.48.0 │ └── binstalk v0.13.0 (/Users/nobodyxu/Dev/cargo-binstall/crates/binstalk) (*) ├── gix-attributes v0.14.1 │ ├── gix v0.48.0 (*) │ └── gix-worktree v0.21.1 │ └── gix v0.48.0 (*) ├── gix-config v0.25.1 │ └── gix v0.48.0 (*) ├── guess_host_triple v0.1.3 │ └── detect-targets v0.1.8 (/Users/nobodyxu/Dev/cargo-binstall/crates/detect-targets) │ └── binstalk v0.13.0 (/Users/nobodyxu/Dev/cargo-binstall/crates/binstalk) (*) │ [dev-dependencies] │ └── binstalk-manifests v0.6.0 (/Users/nobodyxu/Dev/cargo-binstall/crates/binstalk-manifests) │ └── cargo-binstall v1.0.0 (/Users/nobodyxu/Dev/cargo-binstall/crates/bin) ├── reqwest v0.11.18 │ ├── binstalk-downloader v0.6.0 (/Users/nobodyxu/Dev/cargo-binstall/crates/binstalk-downloader) (*) │ ├── gix v0.48.0 (*) │ └── gix-transport v0.33.1 │ ├── gix v0.48.0 (*) │ └── gix-protocol v0.35.0 │ └── gix v0.48.0 (*) ├── rustls v0.20.8 │ ├── quinn v0.8.5 │ │ └── trust-dns-proto v0.22.0 │ │ └── trust-dns-resolver v0.22.0 │ │ ├── binstalk-downloader v0.6.0 (/Users/nobodyxu/Dev/cargo-binstall/crates/binstalk-downloader) (*) │ │ └── reqwest v0.11.18 (*) │ ├── quinn-proto v0.8.4 │ │ ├── quinn v0.8.5 (*) │ │ └── quinn-udp v0.1.4 │ │ └── quinn v0.8.5 (*) │ ├── tokio-rustls v0.23.4 │ │ ├── trust-dns-proto v0.22.0 (*) │ │ └── trust-dns-resolver v0.22.0 (*) │ ├── trust-dns-proto v0.22.0 (*) │ └── trust-dns-resolver v0.22.0 (*) ├── rustls v0.21.5 │ ├── hyper-rustls v0.24.1 │ │ └── reqwest v0.11.18 (*) │ ├── reqwest v0.11.18 (*) │ └── tokio-rustls v0.24.1 │ ├── hyper-rustls v0.24.1 (*) │ └── reqwest v0.11.18 (*) ├── tracing v0.1.37 │ ├── binstalk v0.13.0 (/Users/nobodyxu/Dev/cargo-binstall/crates/binstalk) (*) │ ├── binstalk-downloader v0.6.0 (/Users/nobodyxu/Dev/cargo-binstall/crates/binstalk-downloader) (*) │ ├── cargo-binstall v1.0.0 (/Users/nobodyxu/Dev/cargo-binstall/crates/bin) │ ├── h2 v0.3.20 │ │ ├── hyper v0.14.27 │ │ │ ├── hyper-rustls v0.24.1 (*) │ │ │ └── reqwest v0.11.18 (*) │ │ ├── reqwest v0.11.18 (*) │ │ └── trust-dns-proto v0.22.0 (*) │ ├── hyper v0.14.27 (*) │ ├── quinn v0.8.5 (*) │ ├── quinn-proto v0.8.4 (*) │ ├── quinn-udp v0.1.4 (*) │ ├── tokio-util v0.7.8 │ │ ├── async_zip v0.0.15 (*) │ │ ├── binstalk-downloader v0.6.0 (/Users/nobodyxu/Dev/cargo-binstall/crates/binstalk-downloader) (*) │ │ ├── h2 v0.3.20 (*) │ │ ├── reqwest v0.11.18 (*) │ │ └── tower v0.4.13 │ │ └── binstalk-downloader v0.6.0 (/Users/nobodyxu/Dev/cargo-binstall/crates/binstalk-downloader) (*) │ ├── tower v0.4.13 (*) │ ├── trust-dns-proto v0.22.0 (*) │ └── trust-dns-resolver v0.22.0 (*) ├── tracing-log v0.1.3 │ └── cargo-binstall v1.0.0 (/Users/nobodyxu/Dev/cargo-binstall/crates/bin) └── wasm-bindgen-backend v0.2.87 └── wasm-bindgen-macro-support v0.2.87 └── wasm-bindgen-macro v0.2.87 (proc-macro) └── wasm-bindgen v0.2.87 ├── js-sys v0.3.64 │ ├── reqwest v0.11.18 (*) │ ├── wasm-bindgen-futures v0.4.37 │ │ ├── reqwest v0.11.18 (*) │ │ └── wasm-streams v0.2.3 │ │ └── reqwest v0.11.18 (*) │ ├── wasm-streams v0.2.3 (*) │ └── web-sys v0.3.64 │ ├── reqwest v0.11.18 (*) │ ├── ring v0.16.20 │ │ ├── quinn-proto v0.8.4 (*) │ │ ├── rustls v0.20.8 (*) │ │ ├── rustls v0.21.5 (*) │ │ ├── rustls-webpki v0.101.1 │ │ │ └── rustls v0.21.5 (*) │ │ ├── sct v0.7.0 │ │ │ ├── rustls v0.20.8 (*) │ │ │ └── rustls v0.21.5 (*) │ │ ├── trust-dns-proto v0.22.0 (*) │ │ └── webpki v0.22.0 │ │ ├── quinn v0.8.5 (*) │ │ ├── quinn-proto v0.8.4 (*) │ │ ├── rustls v0.20.8 (*) │ │ ├── tokio-rustls v0.23.4 (*) │ │ ├── trust-dns-proto v0.22.0 (*) │ │ └── webpki-roots v0.22.6 │ │ ├── reqwest v0.11.18 (*) │ │ ├── trust-dns-proto v0.22.0 (*) │ │ └── trust-dns-resolver v0.22.0 (*) │ ├── wasm-bindgen-futures v0.4.37 (*) │ └── wasm-streams v0.2.3 (*) ├── reqwest v0.11.18 (*) ├── wasm-bindgen-futures v0.4.37 (*) ├── wasm-streams v0.2.3 (*) └── web-sys v0.3.64 (*) ```
EFanZh commented 1 year ago

In cargo-binstall, log is a dependency that is heavily used

It seems that cargo-binstall mostly uses logging macros from tracing, not log. Even though it depends on the log crate, not using its macros means it may not benefit from these optimizations.

NobodyXu commented 1 year ago

In cargo-binstall, log is a dependency that is heavily used

It seems that cargo-binstall mostly uses logging macros from tracing, not log. Even though it depends on the log crate, not using its macros means it may not benefit from these optimizations.

It uses many dependencies which depends on log, like gix, async-zip, etc.

If you check the "details" which contained the dependency tree, you would find it actually heavily depends on log.

EFanZh commented 1 year ago

I have tests this PR on cargo-binstall, and here is the binary size comparison:

opt-level lto codegen-units original size optimized size diff
3 off 1 14211752 14203560 -8192
3 off 16 16739048 16726760 -12288
3 thin 1 15170120 15170120 0
3 thin 16 17566376 17574568 8192
3 fat 1 13859336 13859336 0
3 fat 16 15231496 15227400 -4096
"z" off 1 11451048 11430568 -20480
"z" off 16 13531848 13503176 -28672
"z" thin 1 12307048 12286568 -20480
"z" thin 16 13699720 13736584 36864
"z" fat 1 9951752 9931272 -20480
"z" fat 16 10476040 10447368 -28672

You can reproduce the result using this program.

KodrAus commented 2 months ago

Thanks for investigating this @EFanZh! These were a useful point-in-time reference that helped make some improvements, but I think we should close now that things have moved along. Please feel free to open any fresh PRs that optimize binary size based on the current state of the code though.