pydata / sparse

Sparse multi-dimensional arrays for the PyData ecosystem
https://sparse.pydata.org
BSD 3-Clause "New" or "Revised" License
602 stars 126 forks source link

feat: `__getitem__` logic for MLIR backend #779

Open mtsokol opened 1 month ago

mtsokol commented 1 month ago

Hi @hameerabbasi,

This PR adds __getitem__ logic so that tensor[:, :, ...] can be run. The current version preserves rank (and format).

For now unfortunately it's blocked by https://discourse.llvm.org/t/illegal-operation-when-slicing-csr-csc-coo-tensor/81404 and I'm not sure if SparseTensor dialect fully supports slices.

An interesting case is for example tensor[:, :] which just returns tensor but our ownership mechanism sees it as MLIR allocated object, where in the meantime it's still SciPy/NumPy that was passed in. I think the mechanism requires a tweak where calling MLIR ops (reshape, slices, elemwise) should also tell if it's MLIR allocated (thus requires a free) or just a reference to what was passed (SciPy/NumPy managed arrays).

hameerabbasi commented 1 month ago

I wonder if adding ndindex as a dependency makes sense?

codspeed-hq[bot] commented 1 month ago

CodSpeed Performance Report

Merging #779 will degrade performances by 54.63%

Comparing getitem-func (fbd4586) with main (db60537)

Summary

❌ 2 regressions ✅ 338 untouched benchmarks

:warning: Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main getitem-func Change
test_index_fancy[side=100-rank=1-format='coo'] 643.5 µs 1,418.3 µs -54.63%
test_index_slice[side=100-rank=2-format='gcxs'] 2.2 ms 2.7 ms -19.77%