taiki-e / pin-project-lite

A lightweight version of pin-project written with declarative macros.
https://docs.rs/pin-project-lite
Apache License 2.0
216 stars 15 forks source link

Inline project methods #74

Closed EFanZh closed 1 year ago

EFanZh commented 1 year ago

Sometime these methods are not inline under opt-level=z compile option, forcing inlining could be beneficial for performance and binary size.

taiki-e commented 1 year ago

Thanks for the PR!

It is not clear that it is always reasonable to do this, since a large number of inline attributes will affect compile time (e.g., see https://github.com/rust-lang/hashbrown/pull/119)

Also, if you want to fully inline project_replace without LTO, the inline attribute may also be needed for private types in pin-project-lite.

In any case, please provide specific data on what this will improve. (generated asm[^1], reproducible benchmarks, etc.)

[^1]: depend on the compiler version, but pin-project-lite is available on godbolt: https://godbolt.org/z/eTxGzb5E7

EFanZh commented 1 year ago

Here is my attempt at comparing the effect of inlining: https://github.com/EFanZh/pin-project-lite/tree/compare-inlining. (Sorry for not using godbolt, I couldn’t figure out how to patch pin-project-lite with a inlined version.)

Under opt-level="z", here are the assembly code generated before and after inlining:

You can see project and project_replace not being inlined in the original version. As for the private types, except for drop_in_place::<UnsafeOverwriteGuard<...>>, everything is already inlined, and I don’t know how to tell the compiler to inline that drop_in_place.

Also, I have tested the effects of #[inline] with opt-level=3, and the result is the functions are all inlined in both cases:

These examples are handcrafted by me, maybe it’s better to test on a real project. But I don’t know any suitable projects that uses pin-project-lite extensively, any suggestions?

EFanZh commented 1 year ago

I tested this patch on tokio-console with this branch: https://github.com/EFanZh/console/tree/compare-inlining-pin-project-methods. Here is the binary sizes comparison:

original release     8145408
patched release      8138240

original release -z  6450688
patched release -z   6426112

Currently, I have only tested binary size changes. @taiki-e do you need other tests? For example: runtime performance benchmarks?

EFanZh commented 1 year ago

Hi @taiki-e: Is there anything I can do to get this PR merged?

taiki-e commented 1 year ago

Sorry for the late reply. And thanks for the investigation. Assuming there is no impact on compile time, the effects of this all appear to be positive.

As for compile time, @nnethercote previously measured it by building async-std (https://github.com/taiki-e/pin-project-lite/pull/71). That said, I'm not sure if building the library is the right way to measure the impact of #[inline] on compile time. We may need to measure the compile time of binaries that use tokio, async-std, etc.

EFanZh commented 1 year ago

Again, I have tested the compile time of tokio-console using this bash script:

#!/bin/bash -e

benchmark() {
    target_report=$1

    shift

    cargo clean
    cargo fetch "$@"
    cargo build --release --timings "$@"
    mv 'target/cargo-timings/cargo-timing.html' "$target_report"
}

patching_args=(
    --config 'patch.crates-io.pin-project-lite.git="https://github.com/EFanZh/pin-project-lite"' \
    --config 'patch.crates-io.pin-project-lite.rev="5ebf6921de62887572fa4e0477e725567de176df"'
)

for opt_level in '"z"' '3'; do
    for lto in 'off' 'thin' 'fat'; do
        for codegen_units in '16' '1'; do
            base_args=(
                --config "profile.release.opt-level=$opt_level"
                --config "profile.release.lto=\"$lto\""
                --config "profile.release.codegen-units=$codegen_units"
            )

            report_base_name="opt_level=${opt_level//\"/} lto=$lto codegen_units=$codegen_units"

            benchmark "$report_base_name - original.html" "${base_args[@]}"
            benchmark "$report_base_name - patched.html" "${base_args[@]}" "${patching_args[@]}"
        done
    done
done

Here is my test result:

Configuration Original Patched
opt_level=z lto=off codegen_units=16 51.7s 46.3s
opt_level=z lto=off codegen_units=1 67.7s 67.6s
opt_level=z lto=thin codegen_units=16 55.9s 55.9s
opt_level=z lto=thin codegen_units=1 76.2s 69.4s
opt_level=z lto=fat codegen_units=16 92.1s 91.7s
opt_level=z lto=fat codegen_units=1 97.2s 97.3s
opt_level=3 lto=off codegen_units=16 83.9s 83.4s
opt_level=3 lto=off codegen_units=1 103.9s 103.6s
opt_level=3 lto=thin codegen_units=16 68.6s 68.6s
opt_level=3 lto=thin codegen_units=1 115.2s 115.6s
opt_level=3 lto=fat codegen_units=16 140.2s 137.2s
opt_level=3 lto=fat codegen_units=1 141.9s 142.5s

I didn’t see significant compile time change for this change.

taiki-e commented 1 year ago

Published in v0.2.10. Thanks @EFanZh.