Eliminate extensions allocations

SUPERCILEX commented 9 months ago

Fixes #428

What does this do?

Pools allocations of extensions (implicitly using the fact that AnyMap is stored in a pool) by reusing Boxes.

Why can't this be safe code?

It'd be nice to do this:

enum BoxedExt {
    Live(dyn std::any::Any + Sync + Send),
    Dead,
}

Sadly you can't do that because the dyn is unsized. Instead, we manually re-implement the vtable in unsafe code.

Miri has been run on the entire repo with this change:

RUSTFLAGS="--cfg nightly" MIRIFLAGS=-Zmiri-backtrace=full cargo miri t --workspace

Performance

Tested on https://github.com/SUPERCILEX/ftzz.

Before:

After:

The increased fragmentation is a little surprising and a bummer, but that might just be in my application as there were no other allocations to remove.

Random notes

There's no need to implement cache eviction because the HashMap is keyed by types which means the maximum amount of memory used is pre-determined at compile time.

I'm using strict provenance APIs for miri testing as I need to use usize to ptr casts which is no bueno.

SUPERCILEX commented 9 months ago

Seems to have failed for an unrelated reason: https://github.com/tokio-rs/tracing/pull/2842

davidbarsky commented 9 months ago

Oh wow, thanks for opening this PR and sorry for the delay in responding. I haven't gone over the benchmarks yet (I'd like to rerun them on a less noisy machine...), but they seem to be very promising:

Benchmarking results

``` tracing/tracing-subscriber on  ext-alloc [$] is 📦 v0.3.0 via 🦀 v1.74.0 at 01:37:55 PM ❯ cargo +nightly bench --features=env-filter event Compiling tracing-subscriber v0.3.0 (/Users/dbarsky/Developer/tracing/tracing-subscriber) Compiling tracing-mock v0.2.0 (/Users/dbarsky/Developer/tracing/tracing-mock) Finished bench [optimized] target(s) in 5.98s Running benches/enter.rs (/Users/dbarsky/Developer/tracing/target/release/deps/enter-2a504e389e1a34aa) WARNING: HTML report generation will become a non-default optional feature in Criterion.rs 0.4.0. This feature is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'html_reports' feature in your Cargo.toml. WARNING: In Criterion.rs 0.4.0, running criterion benchmarks outside of cargo-criterion will become a default optional feature. The statistical analysis and reporting is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'cargo_bench_support' feature in your Cargo.toml. Gnuplot not found, using plotters backend Running benches/filter.rs (/Users/dbarsky/Developer/tracing/target/release/deps/filter-129f2e31f8caea56) WARNING: HTML report generation will become a non-default optional feature in Criterion.rs 0.4.0. This feature is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'html_reports' feature in your Cargo.toml. WARNING: In Criterion.rs 0.4.0, running criterion benchmarks outside of cargo-criterion will become a default optional feature. The statistical analysis and reporting is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'cargo_bench_support' feature in your Cargo.toml. Gnuplot not found, using plotters backend Running benches/filter_log.rs (/Users/dbarsky/Developer/tracing/target/release/deps/filter_log-8a039b6535b4594b) WARNING: HTML report generation will become a non-default optional feature in Criterion.rs 0.4.0. This feature is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'html_reports' feature in your Cargo.toml. WARNING: In Criterion.rs 0.4.0, running criterion benchmarks outside of cargo-criterion will become a default optional feature. The statistical analysis and reporting is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'cargo_bench_support' feature in your Cargo.toml. Gnuplot not found, using plotters backend Running benches/fmt.rs (/Users/dbarsky/Developer/tracing/target/release/deps/fmt-5825a19aa1a8f04b) WARNING: HTML report generation will become a non-default optional feature in Criterion.rs 0.4.0. This feature is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'html_reports' feature in your Cargo.toml. WARNING: In Criterion.rs 0.4.0, running criterion benchmarks outside of cargo-criterion will become a default optional feature. The statistical analysis and reporting is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'cargo_bench_support' feature in your Cargo.toml. Gnuplot not found, using plotters backend event/root/single_threaded/1 time: [504.97 ns 508.09 ns 511.70 ns] thrpt: [1.9543 Melem/s 1.9681 Melem/s 1.9803 Melem/s] change: time: [-1.5972% -1.0250% -0.5214%] (p = 0.00 < 0.05) thrpt: [+0.5242% +1.0356% +1.6232%] Change within noise threshold. event/root/multithreaded/1 time: [10.656 µs 10.851 µs 11.119 µs] thrpt: [89.936 Kelem/s 92.157 Kelem/s 93.846 Kelem/s] change: time: [-53.423% -51.753% -50.042%] (p = 0.00 < 0.05) thrpt: [+100.17% +107.27% +114.70%] Performance has improved. Found 10 outliers among 100 measurements (10.00%) 1 (1.00%) low mild 8 (8.00%) high mild 1 (1.00%) high severe event/unique_parent/single_threaded/1 time: [799.32 ns 800.50 ns 801.75 ns] thrpt: [1.2473 Melem/s 1.2492 Melem/s 1.2511 Melem/s] change: time: [-1.6751% -1.1944% -0.7508%] (p = 0.00 < 0.05) thrpt: [+0.7565% +1.2089% +1.7036%] Change within noise threshold. Found 5 outliers among 100 measurements (5.00%) 5 (5.00%) high mild event/unique_parent/multithreaded/1 time: [24.005 µs 24.512 µs 25.009 µs] thrpt: [39.985 Kelem/s 40.796 Kelem/s 41.657 Kelem/s] change: time: [+46.586% +51.273% +56.129%] (p = 0.00 < 0.05) thrpt: [-35.950% -33.894% -31.781%] Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild event/shared_parent/multithreaded/1 time: [13.038 µs 13.342 µs 13.736 µs] thrpt: [72.802 Kelem/s 74.954 Kelem/s 76.698 Kelem/s] change: time: [-40.476% -37.473% -34.227%] (p = 0.00 < 0.05) thrpt: [+52.039% +59.930% +68.000%] Performance has improved. Found 12 outliers among 100 measurements (12.00%) 1 (1.00%) low mild 3 (3.00%) high mild 8 (8.00%) high severe event/multi-parent/multithreaded/1 time: [17.437 µs 17.644 µs 17.835 µs] thrpt: [56.068 Kelem/s 56.677 Kelem/s 57.348 Kelem/s] change: time: [-23.232% -19.392% -15.405%] (p = 0.00 < 0.05) thrpt: [+18.210% +24.058% +30.263%] Performance has improved. Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild 2 (2.00%) high severe event/root/single_threaded/10 time: [5.2370 µs 5.2491 µs 5.2614 µs] thrpt: [1.9006 Melem/s 1.9051 Melem/s 1.9095 Melem/s] change: time: [+2.1132% +2.5586% +2.9976%] (p = 0.00 < 0.05) thrpt: [-2.9104% -2.4948% -2.0695%] Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) low mild 1 (1.00%) high mild event/root/multithreaded/10 time: [20.794 µs 20.930 µs 21.093 µs] thrpt: [474.09 Kelem/s 477.78 Kelem/s 480.92 Kelem/s] change: time: [-28.254% -26.113% -23.949%] (p = 0.00 < 0.05) thrpt: [+31.491% +35.341% +39.380%] Performance has improved. Found 15 outliers among 100 measurements (15.00%) 2 (2.00%) low severe 8 (8.00%) low mild 1 (1.00%) high mild 4 (4.00%) high severe event/unique_parent/single_threaded/10 time: [8.1886 µs 8.2119 µs 8.2361 µs] thrpt: [1.2142 Melem/s 1.2177 Melem/s 1.2212 Melem/s] change: time: [-1.4775% -1.0266% -0.5883%] (p = 0.00 < 0.05) thrpt: [+0.5918% +1.0372% +1.4997%] Change within noise threshold. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild event/unique_parent/multithreaded/10 time: [31.111 µs 31.496 µs 31.931 µs] thrpt: [313.18 Kelem/s 317.50 Kelem/s 321.43 Kelem/s] change: time: [+0.6886% +3.6686% +6.9823%] (p = 0.02 < 0.05) thrpt: [-6.5266% -3.5388% -0.6839%] Change within noise threshold. Found 9 outliers among 100 measurements (9.00%) 1 (1.00%) low severe 7 (7.00%) high mild 1 (1.00%) high severe event/shared_parent/multithreaded/10 time: [31.020 µs 31.400 µs 31.703 µs] thrpt: [315.43 Kelem/s 318.48 Kelem/s 322.37 Kelem/s] change: time: [-19.896% -17.836% -15.984%] (p = 0.00 < 0.05) thrpt: [+19.025% +21.707% +24.838%] Performance has improved. Found 18 outliers among 100 measurements (18.00%) 6 (6.00%) low severe 4 (4.00%) low mild 2 (2.00%) high mild 6 (6.00%) high severe event/multi-parent/multithreaded/10 time: [55.046 µs 55.661 µs 56.363 µs] thrpt: [177.42 Kelem/s 179.66 Kelem/s 181.67 Kelem/s] change: time: [-39.343% -35.478% -31.532%] (p = 0.00 < 0.05) thrpt: [+46.053% +54.985% +64.860%] Performance has improved. Found 13 outliers among 100 measurements (13.00%) 3 (3.00%) low severe 2 (2.00%) low mild 4 (4.00%) high mild 4 (4.00%) high severe event/root/single_threaded/50 time: [26.178 µs 26.244 µs 26.306 µs] thrpt: [1.9007 Melem/s 1.9052 Melem/s 1.9100 Melem/s] change: time: [+2.4375% +3.0112% +3.5159%] (p = 0.00 < 0.05) thrpt: [-3.3965% -2.9231% -2.3795%] Performance has regressed. event/root/multithreaded/50 time: [47.489 µs 47.806 µs 48.079 µs] thrpt: [1.0400 Melem/s 1.0459 Melem/s 1.0529 Melem/s] change: time: [-2.5381% -0.8321% +0.9569%] (p = 0.38 > 0.05) thrpt: [-0.9478% +0.8391% +2.6041%] No change in performance detected. Found 13 outliers among 100 measurements (13.00%) 4 (4.00%) low severe 3 (3.00%) low mild 3 (3.00%) high mild 3 (3.00%) high severe event/unique_parent/single_threaded/50 time: [41.218 µs 41.307 µs 41.402 µs] thrpt: [1.2077 Melem/s 1.2105 Melem/s 1.2131 Melem/s] change: time: [-1.1814% -0.6921% -0.2034%] (p = 0.01 < 0.05) thrpt: [+0.2038% +0.6969% +1.1955%] Change within noise threshold. Found 5 outliers among 100 measurements (5.00%) 4 (4.00%) high mild 1 (1.00%) high severe event/unique_parent/multithreaded/50 time: [66.945 µs 69.091 µs 70.728 µs] thrpt: [706.93 Kelem/s 723.68 Kelem/s 746.88 Kelem/s] change: time: [-7.7719% -3.0993% +2.1468%] (p = 0.21 > 0.05) thrpt: [-2.1017% +3.1984% +8.4268%] No change in performance detected. Found 16 outliers among 100 measurements (16.00%) 13 (13.00%) low severe 3 (3.00%) low mild event/shared_parent/multithreaded/50 time: [92.784 µs 97.080 µs 101.25 µs] thrpt: [493.82 Kelem/s 515.04 Kelem/s 538.89 Kelem/s] change: time: [+9.8937% +15.335% +21.256%] (p = 0.00 < 0.05) thrpt: [-17.530% -13.296% -9.0030%] Performance has regressed. Found 23 outliers among 100 measurements (23.00%) 15 (15.00%) low severe 6 (6.00%) low mild 2 (2.00%) high mild event/multi-parent/multithreaded/50 time: [472.43 µs 473.79 µs 475.08 µs] thrpt: [105.25 Kelem/s 105.53 Kelem/s 105.84 Kelem/s] change: time: [-2.3126% -1.9288% -1.5301%] (p = 0.00 < 0.05) thrpt: [+1.5539% +1.9667% +2.3674%] Performance has improved. Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild ```

(This was after I ran the same set of benchmarks on the master branch, to be clear.)

hawkw commented 9 months ago

Thanks for working on this, this is something I had been planning on doing for a long time but never had the time for. I'm really excited to review this PR!

SUPERCILEX commented 9 months ago

Did a bit of cleanup, the PR should be in a nicer state now.

tokio-rs / tracing