ENH: Implement `broadcast_to` function

mtsokol commented 1 month ago

Hi @hameerabbasi,

This PR adds broadcast_to Array API function.

It it only broadcasts missing dimensions, for 1-size dimension it can't broadcast as it's a limitation of linalg.broadcast in MLIR.

codspeed-hq[bot] commented 1 month ago

CodSpeed Performance Report

Merging #782 will degrade performances by 12.39%

_{Comparing broadcast_to-func (8448d56) with main (1593a0f)}

Summary

⚡ 10 improvements ❌ 1 regressions ✅ 329 untouched benchmarks

:warning: _Please fix the performance issues or acknowledge them on CodSpeed._

Benchmarks breakdown

	Benchmark	`main`	`broadcast_to-func`	Change
⚡	`test_index_fancy[side=100-rank=1-format='coo']`	1,417 µs	639.6 µs	×2.2
❌	`test_index_slice[side=100-rank=2-format='gcxs']`	2.4 ms	2.7 ms	-12.39%
⚡	`test_elemwise[f=<built-in function add>-backend='Finch'-side=1000]`	1.9 ms	1.5 ms	+28.29%
⚡	`test_elemwise[f=<built-in function add>-backend='Finch'-side=100]`	1,148.7 µs	714.3 µs	+60.81%
⚡	`test_elemwise[f=<built-in function add>-backend='Finch'-side=500]`	1,298.4 µs	881.9 µs	+47.23%
⚡	`test_elemwise[f=<built-in function gt>-backend='Finch'-side=1000]`	1.7 ms	1.4 ms	+21.24%
⚡	`test_elemwise[f=<built-in function gt>-backend='Finch'-side=100]`	982.4 µs	705.8 µs	+39.2%
⚡	`test_elemwise[f=<built-in function gt>-backend='Finch'-side=500]`	1,187.4 µs	877.5 µs	+35.31%
⚡	`test_elemwise[f=<built-in function mul>-backend='Finch'-side=1000]`	1,149.7 µs	735.3 µs	+56.35%
⚡	`test_elemwise[f=<built-in function mul>-backend='Finch'-side=100]`	1,110.3 µs	691.2 µs	+60.64%
⚡	`test_elemwise[f=<built-in function mul>-backend='Finch'-side=500]`	1,127 µs	712.2 µs	+58.24%

hameerabbasi commented 1 month ago

I'm hesitant about merging this as it contains lots of heuristics to figure out the output format -- not to mention there should be a "broadcast level" that's the equivalent of a "zero stride", or perhaps we can ignore the broadcast indices on the Python side.

mtsokol commented 1 month ago

I'm hesitant about merging this as it contains lots of heuristics to figure out the output format -- not to mention there should be a "broadcast level" that's the equivalent of a "zero stride", or perhaps we can ignore the broadcast indices on the Python side.

The heuristics are gone (they're already added by a different PR). What do you mean by a "broadcast level"?

hameerabbasi commented 1 month ago

What do you mean by a "broadcast level"?

Something that represents a zero-stride, or an ignored index.

mtsokol commented 1 month ago

Can you give an Array API broadcast_to(...) example call that uses ignored index? I'm not sure which feature you mean.

mtsokol commented 1 month ago

Decided that for now broadcast_to is a copying operation and Tensor views should be included on the MLIR or Finch Dialect side.

hameerabbasi commented 1 month ago

Force-merged due to the CI hiccup. Thanks, @mtsokol!

pydata / sparse