Closed mtsokol closed 1 month ago
Comparing broadcast_to-func
(8448d56) with main
(1593a0f)
⚡ 10
improvements
❌ 1
regressions
✅ 329
untouched benchmarks
:warning: _Please fix the performance issues or acknowledge them on CodSpeed._
Benchmark | main |
broadcast_to-func |
Change | |
---|---|---|---|---|
⚡ | test_index_fancy[side=100-rank=1-format='coo'] |
1,417 µs | 639.6 µs | ×2.2 |
❌ | test_index_slice[side=100-rank=2-format='gcxs'] |
2.4 ms | 2.7 ms | -12.39% |
⚡ | test_elemwise[f=<built-in function add>-backend='Finch'-side=1000] |
1.9 ms | 1.5 ms | +28.29% |
⚡ | test_elemwise[f=<built-in function add>-backend='Finch'-side=100] |
1,148.7 µs | 714.3 µs | +60.81% |
⚡ | test_elemwise[f=<built-in function add>-backend='Finch'-side=500] |
1,298.4 µs | 881.9 µs | +47.23% |
⚡ | test_elemwise[f=<built-in function gt>-backend='Finch'-side=1000] |
1.7 ms | 1.4 ms | +21.24% |
⚡ | test_elemwise[f=<built-in function gt>-backend='Finch'-side=100] |
982.4 µs | 705.8 µs | +39.2% |
⚡ | test_elemwise[f=<built-in function gt>-backend='Finch'-side=500] |
1,187.4 µs | 877.5 µs | +35.31% |
⚡ | test_elemwise[f=<built-in function mul>-backend='Finch'-side=1000] |
1,149.7 µs | 735.3 µs | +56.35% |
⚡ | test_elemwise[f=<built-in function mul>-backend='Finch'-side=100] |
1,110.3 µs | 691.2 µs | +60.64% |
⚡ | test_elemwise[f=<built-in function mul>-backend='Finch'-side=500] |
1,127 µs | 712.2 µs | +58.24% |
I'm hesitant about merging this as it contains lots of heuristics to figure out the output format -- not to mention there should be a "broadcast level" that's the equivalent of a "zero stride", or perhaps we can ignore the broadcast indices on the Python side.
I'm hesitant about merging this as it contains lots of heuristics to figure out the output format -- not to mention there should be a "broadcast level" that's the equivalent of a "zero stride", or perhaps we can ignore the broadcast indices on the Python side.
The heuristics are gone (they're already added by a different PR). What do you mean by a "broadcast level"?
What do you mean by a "broadcast level"?
Something that represents a zero-stride, or an ignored index.
Can you give an Array API broadcast_to(...)
example call that uses ignored index? I'm not sure which feature you mean.
Decided that for now broadcast_to
is a copying operation and Tensor views should be included on the MLIR or Finch Dialect side.
Force-merged due to the CI hiccup. Thanks, @mtsokol!
Hi @hameerabbasi,
This PR adds
broadcast_to
Array API function.It it only broadcasts missing dimensions, for
1
-size dimension it can't broadcast as it's a limitation oflinalg.broadcast
in MLIR.