Closed EFanZh closed 1 year ago
Thanks for the PR!
It is not clear that it is always reasonable to do this, since a large number of inline attributes will affect compile time (e.g., see https://github.com/rust-lang/hashbrown/pull/119)
Also, if you want to fully inline project_replace without LTO, the inline attribute may also be needed for private types in pin-project-lite.
In any case, please provide specific data on what this will improve. (generated asm[^1], reproducible benchmarks, etc.)
[^1]: depend on the compiler version, but pin-project-lite is available on godbolt: https://godbolt.org/z/eTxGzb5E7
Here is my attempt at comparing the effect of inlining: https://github.com/EFanZh/pin-project-lite/tree/compare-inlining. (Sorry for not using godbolt, I couldn’t figure out how to patch pin-project-lite
with a inlined version.)
Under opt-level="z"
, here are the assembly code generated before and after inlining:
You can see project
and project_replace
not being inlined in the original version. As for the private types, except for drop_in_place::<UnsafeOverwriteGuard<...>>
, everything is already inlined, and I don’t know how to tell the compiler to inline that drop_in_place
.
Also, I have tested the effects of #[inline]
with opt-level=3
, and the result is the functions are all inlined in both cases:
These examples are handcrafted by me, maybe it’s better to test on a real project. But I don’t know any suitable projects that uses pin-project-lite
extensively, any suggestions?
I tested this patch on tokio-console with this branch: https://github.com/EFanZh/console/tree/compare-inlining-pin-project-methods. Here is the binary sizes comparison:
original release 8145408
patched release 8138240
original release -z 6450688
patched release -z 6426112
Currently, I have only tested binary size changes. @taiki-e do you need other tests? For example: runtime performance benchmarks?
Hi @taiki-e: Is there anything I can do to get this PR merged?
Sorry for the late reply. And thanks for the investigation. Assuming there is no impact on compile time, the effects of this all appear to be positive.
As for compile time, @nnethercote previously measured it by building async-std (https://github.com/taiki-e/pin-project-lite/pull/71). That said, I'm not sure if building the library is the right way to measure the impact of #[inline]
on compile time. We may need to measure the compile time of binaries that use tokio, async-std, etc.
Again, I have tested the compile time of tokio-console using this bash script:
#!/bin/bash -e
benchmark() {
target_report=$1
shift
cargo clean
cargo fetch "$@"
cargo build --release --timings "$@"
mv 'target/cargo-timings/cargo-timing.html' "$target_report"
}
patching_args=(
--config 'patch.crates-io.pin-project-lite.git="https://github.com/EFanZh/pin-project-lite"' \
--config 'patch.crates-io.pin-project-lite.rev="5ebf6921de62887572fa4e0477e725567de176df"'
)
for opt_level in '"z"' '3'; do
for lto in 'off' 'thin' 'fat'; do
for codegen_units in '16' '1'; do
base_args=(
--config "profile.release.opt-level=$opt_level"
--config "profile.release.lto=\"$lto\""
--config "profile.release.codegen-units=$codegen_units"
)
report_base_name="opt_level=${opt_level//\"/} lto=$lto codegen_units=$codegen_units"
benchmark "$report_base_name - original.html" "${base_args[@]}"
benchmark "$report_base_name - patched.html" "${base_args[@]}" "${patching_args[@]}"
done
done
done
Here is my test result:
Configuration | Original | Patched |
---|---|---|
opt_level=z lto=off codegen_units=16 | 51.7s | 46.3s |
opt_level=z lto=off codegen_units=1 | 67.7s | 67.6s |
opt_level=z lto=thin codegen_units=16 | 55.9s | 55.9s |
opt_level=z lto=thin codegen_units=1 | 76.2s | 69.4s |
opt_level=z lto=fat codegen_units=16 | 92.1s | 91.7s |
opt_level=z lto=fat codegen_units=1 | 97.2s | 97.3s |
opt_level=3 lto=off codegen_units=16 | 83.9s | 83.4s |
opt_level=3 lto=off codegen_units=1 | 103.9s | 103.6s |
opt_level=3 lto=thin codegen_units=16 | 68.6s | 68.6s |
opt_level=3 lto=thin codegen_units=1 | 115.2s | 115.6s |
opt_level=3 lto=fat codegen_units=16 | 140.2s | 137.2s |
opt_level=3 lto=fat codegen_units=1 | 141.9s | 142.5s |
I didn’t see significant compile time change for this change.
Published in v0.2.10. Thanks @EFanZh.
Sometime these methods are not inline under
opt-level=z
compile option, forcing inlining could be beneficial for performance and binary size.