rust-lang / log

Logging implementation for Rust
https://docs.rs/log
Apache License 2.0
2.16k stars 250 forks source link

Specialize empty key value pairs #576

Closed EFanZh closed 11 months ago

EFanZh commented 1 year ago

Currently, the __private_api::log function takes None::<&[(&str, _)]> as the kvs argument if no key-value pair is provided, but passing a None value may have some extra cost. We should be able to avoid this cost by encoding this information into a zero sized type (here I am using (), not sure if it is better to use a dedicated new type (such as struct EmptyKVs;)). Constructing a zero sized value is zero cost, which is less than constructing a None::<&[(&str, _)]>.

EFanZh commented 1 year ago

I have tested the effect on binary size of this PR on cargo-binstall, and this is the result:

opt-level lto codegen-units original size optimized size diff
3 off 1 13754440 13754440 0
3 off 16 16343208 16326824 -16384
3 thin 1 14520328 14520328 0
3 thin 16 17051784 17035400 -16384
3 fat 1 13152232 13152232 0
3 fat 16 14385128 14368744 -16384
"z" off 1 11247688 11251784 4096
"z" off 16 13197448 13172872 -24576
"z" thin 1 12144648 12136456 -8192
"z" thin 16 13488232 13488232 0
"z" fat 1 9662440 9658344 -4096
"z" fat 16 10113000 10100712 -12288

Aside from the (opt-level=z, lto=off, codegen-units=1) combination, all binary size changes are non-positive, you can reproduce the result using: https://github.com/EFanZh/log/tree/binary-size-test/cargo-binstall/test-binary-size.

I also tested the binary sizes with the instantiate_log_function function removed:

opt-level lto codegen-units original size optimized size diff
3 off 1 13754440 13754440 0
3 off 16 16343208 16326824 -16384
3 thin 1 14520328 14520328 0
3 thin 16 17051784 17035400 -16384
3 fat 1 13152232 13152232 0
3 fat 16 14385128 14368744 -16384
"z" off 1 11247688 11251784 4096
"z" off 16 13197448 13176968 -20480
"z" thin 1 12144648 12136456 -8192
"z" thin 16 13488232 13492328 4096
"z" fat 1 9662440 9658344 -4096
"z" fat 16 10113000 10100712 -12288

The binary sizes are mostly the same as the previous one, with the exception of (opt-level=z, lto=off, codegen-units=16) and (opt-level=z, lto=thin, codegen-units=16) combinations, whose effect become worse after removing the instantiate_log_function function.

I also tested the runtime performance with a hand written benchmark:

original  => cycles: 1115, instructions: 79, wall time: 322ns.
optimized => cycles: 954, instructions: 79, wall time: 323ns.

You can find the benchmark code here: https://github.com/EFanZh/log/tree/benchmark. I give up using criterion because the result changes significantly between runs, and I also often run into “took zero time” error. I also tried iai, but I get https://github.com/bheisler/iai/issues/34 error. Finally I decided to use perf-event, since it is what the Rust language uses for its performance benchmark: https://github.com/rust-lang/rustc-perf/blob/master/collector/benchlib/src/measure/perf_counter/linux.rs.

EFanZh commented 11 months ago

Hi @KodrAus, is there anything I need to do to get this PR merged?