openxla / xla

A machine learning compiler for GPUs, CPUs, and ML accelerators
Apache License 2.0
2.4k stars 361 forks source link

[xla:cpu] Add benchmark for compiling a chain if f32[12] buffers #14379

Closed copybara-service[bot] closed 2 days ago

copybara-service[bot] commented 2 days ago

[xla:cpu] Add benchmark for compiling a chain if f32[12] buffers

This is an example of missing aliasing information leading to ~1000x performance regression in a thunk runtime

Thunk ("new") vs classic ("old") runtime: name old cpu/op new cpu/op delta BM_ChainOfAddF32/8/process_time 6.59µs ± 2% 6.92µs ± 3% +4.96% (p=0.008 n=5+5) BM_ChainOfAddF32/16/process_time 7.03µs ± 1% 7.85µs ± 3% +11.66% (p=0.008 n=5+5) BM_ChainOfAddF32/64/process_time 10.6µs ± 3% 13.0µs ± 2% +23.23% (p=0.008 n=5+5) BM_ChainOfAddF32/128/process_time 15.1µs ± 3% 2894.8µs ± 1% +19105.60% (p=0.008 n=5+5) BM_ChainOfAddF32/256/process_time 25.4µs ± 2% 8361.8µs ± 0% +32819.31% (p=0.008 n=5+5) BM_ChainOfAddF32/512/process_time 47.2µs ± 3% 19282.6µs ± 1% +40728.54% (p=0.008 n=5+5)