openxla / xla

A machine learning compiler for GPUs, CPUs, and ML accelerators
Apache License 2.0
2.44k stars 372 forks source link

Vectorized multi-row reductions. #14395

Closed copybara-service[bot] closed 2 weeks ago

copybara-service[bot] commented 2 weeks ago

Vectorized multi-row reductions.

We might be able to squeeze out a tiny bit more by making the writes fully coalesced, but I doubt it's worth the increase in complexity.