numba / numba-scipy

numba_scipy extends Numba to make it aware of SciPy
https://numba-scipy.readthedocs.io/en/latest/
BSD 2-Clause "Simplified" License
256 stars 33 forks source link

Sparse matrices? #29

Closed ivirshup closed 6 months ago

ivirshup commented 4 years ago

@stuartarchibald, I saw on the numba gitter you were working on a scipy.sparse implementation here. I would really like to be able to use sparse matrices in compiled code, and have been implementing a bit of this myself, though primarily aiming at indexing into out-of-core sparse matrices. pydata/sparse has looked like an interesting target for this, but is missing the CSC and CSR formats.

I'd be keen to hear your perspective. What are your plans for sparse matrices here? How do you see this fitting in with pydata/sparse?

stuartarchibald commented 4 years ago

@ivirshup I would hope to implement what's in scipy.sparse but would want to focus on the compressed formats first (CSR, CSC) as they are the most commonly consumed formats by high performance libraries/are more prevalent in high performance numerical code (in my experience). In what I've put together so far I'm aiming for jit-transparency as per Numba, i.e. with or without the @jit decorator a function just works, data types are the same etc. As the SciPy internals for these layouts are NumPy array based the compute cost of translation to native code seems pretty minimal.

I suspect that some work might need doing in SciPy/scikits to expose bindings to high performance libraries.

RE pydata/sparse, we'll see how the above goes first?

SanPen commented 4 years ago

Hi @ivirshup and @stuartarchibald

A couple of months ago I started a numba-based sparse matrix library (CSparse3) which lead me to find this project.

Indeed the access to sparse solvers from scipy would be very nice. I don't know if there is any synergy here.

BR, Santiago

ivirshup commented 4 years ago

@stuartarchibald definitely agree that CSR and CSC formats seem like the most valueable, since it's what you need to work with outside libraries. It would definitely be cool to be able to use GraphBLAS with numba defined types and methods.

My main concern with copying the scipy.sparse matrices' interface is the use of the deprecated np.matrix interface. If you're going for jit-transparency, I'd assume you're implementing the matrix interface here?

@SanPen, you might be interested to look at this issue and corresponding PR about making the scipy.sparse solvers available for pydata/sparse arrays: https://github.com/pydata/sparse/issues/293 https://github.com/scipy/scipy/pull/10901

ivirshup commented 4 years ago

On the topic of array interfaces and CSC/ CSR matrices, there's a discussion here about (partially) providing an array interface to SciPy sparse arrays: https://github.com/theislab/scanpy/issues/921

brocksam commented 4 years ago

What's the current state-of-play with numba-scipy for scipy.sparse? I have a number of projects that I would love to be able to be able to use Numba with scipy.sparse to speed up computations. I'm keen to get involved with this channel of work if I can be of assistance. I haven't contributed to Numba before but have reasonable experience with C/C++/Cython/SciPy so can hopefully contribute usefully in some way.

@stuartarchibald you're probably best placed to advise. Where are we up to with this so far? Is there a roadmap for what needs to be done? How can I be of assistance?

PercyLau commented 4 years ago

Hi, @stuartarchibald @ivirshup @SanPen @brocksam

Supporting sparse matrix in numba is an important feature, especially for easily speeding up big-data-set processes that require a lot of memory. So, is there any further discussion or progress of this issue? I'm also keen to get involved with this channel of work if I can be of assistance. Many projects need this feature to accelerate computations.

esc commented 4 years ago

@PercyLau I am now aware of any discussions regarding the implementation of sparse-matrices in Numba. Perhaps it would make sense to trawl through our discourse: https://numba.discourse.group/ -- and perhaps start a discussion there if none exists?

PercyLau commented 4 years ago

@PercyLau I am now aware of any discussions regarding the implementation of sparse-matrices in Numba. Perhaps it would make sense to trawl through our discourse: https://numba.discourse.group/ -- and perhaps start a discussion there if none exists?

Of course add sparse matrix Very weird this issue was discussed in github and stackoverflow many times but none exists in numba.discourse.group.

PercyLau commented 4 years ago

@PercyLau I am now aware of any discussions regarding the implementation of sparse-matrices in Numba. Perhaps it would make sense to trawl through our discourse: https://numba.discourse.group/ -- and perhaps start a discussion there if none exists?

@esc Perhaps, this could be helpful https://lkpy.readthedocs.io/en/stable/matrix.html#compressed-sparse-row-matrices, where these people implement numba-based sparse matrix. However, a more native version is necessary since sparse matrix is very common, e.g., someone may want to include a sparse matrix into a njit function, for which based on customized implementation is not possible.

esc commented 4 years ago

@PercyLau perhaps the following can help with bringing a third-party extension closer to Numba:

https://numba.pydata.org/numba-doc/dev/extending/entrypoints.html

esc commented 4 years ago

@PercyLau and there is also:

https://sparse.pydata.org/en/stable/index.html

Which, I believe, uses Numba under the hood.

PercyLau commented 4 years ago

@PercyLau and there is also:

https://sparse.pydata.org/en/stable/index.html

Which, I believe, uses Numba under the hood.

@esc. However, the most of the projects lacks of CSR/CSC sparse matrix. I think it is because numba somehow has no convenient sorting function to do lexsort which is inevident to convert a COO to a CSR/CSC.

mdekstrand commented 3 years ago

Hi! I'm the author of LensKit, and am also looking at the state of Numba and sparse matrices more deeply, because I don't think the LensKit source code is a good place for our matrix utilities to live long-term (and indeed, if we can replace our sparse matrix, LensKit itself can be a pure-Python package).

I'm currently thinking about spinning our class out into a standalone package focused on CSR/CSC matrix support. I'd like to clean up our design a bit (right now it is admittedly pretty weird, in part because I haven't done anything to use Numba's lowering to hide the Python/Native matrix class distinction), and also transparently support MKL acceleration (when available) and either pure Numba operations or bindings to another sparse package that can be bundled with it. For my use case in LensKit, when MKL is available, it gives us a significant performance boost, but I would like to keep that transparent if possible. Right now one of my blockers is just figuring out how to ergonomically connect Numba and MKL's inspector-executor framework in a way that is also maximally memory-safe. jitclass's lack of support for __del__ is making that more difficult than I would like.

Filco306 commented 3 years ago

Is there any update on this issue? I would love sparse matrix support in numba. :)

esc commented 3 years ago

Is there any update on this issue? I would love sparse matrix support in numba. :)

Thank you for asking. Not at this time. Continue to watch this issue for updates and thank you for using Numba!

mdekstrand commented 3 years ago

Just FYI, I have spun the sparse matrix code above into a separate package: https://csr.lenskit.org

Filco306 commented 3 years ago

I will try that @mdekstrand and reach out to you if I encounter any issues :D

esc commented 6 months ago

with #92 merged this can now be closed.