Open mmalczak opened 1 year ago
I'm interested in this, and I've already done some early experimentation to measure the performance of burn externally (i.e. using it as an external crate) and compare it with my own implementation. I have a few questions about these internal benchmarks before contributing:
There are some benchmarks in burn-wgpu/benches/reduction.rs
, but we could put them in backend-comparison
instead. We are missing benchmarks for global reduction such as mean
and sum
.
Feature description
Add benchmarks for reduce operations: Reduce one dimension:
Reduce full tensor to a scalar:
There is an open issue to improve the performance of reduce kernels: https://github.com/burn-rs/burn/issues/536 Before starting to work on performance, we need proper benchmarks.