scientific-python / summit-2023

Work summit 2023
1 stars 0 forks source link

Sparse Arrays #2

Open stefanv opened 1 year ago

ivirshup commented 1 year ago

Brought up on the first meeting's hackmd, but I would love to work on getting major packages like scikit-learn, networkx, etc. to support array-API sparse arrays from scipy.

jjerphan commented 1 year ago

Hi all,

Thank you for sharing the notes, @ivirshup.

I think there's different needs to address and constraints to take into account regarding improving sparse array's usability in the ecosystem and not breaking existing implementations' behaviors and workflows.

I am thinking of focusing all my efforts on this issue during the Developer Summit.

What do you think?

ivirshup commented 1 year ago

I think that would be great, and would be very interested in doing work along these lines as well. With the caveat that I'm not particularly familiar with Cython.

jjerphan commented 1 year ago

With the caveat that I'm not particularly familiar with Cython.

It's fine. That's something people can help you with, I think.

ivirshup commented 1 year ago

I would also be quite keen on making PRs into downstream package (especially dask, xarray, scikit-learn, my own packages) making sure these types are supported. I think this could be quite useful for finding pain points around usability in the ecosystem.

Is there a centralized place where I can look at planned work/ known issues around this in scipy?

I would also be interested in a call with those interested to figure out specific goals for the hackathon.

dschult commented 1 year ago

I also plan to focus on this issue for the Developer Summit. And it'll be important to get some sort of consensus about which aspects of a sparse array revamp we can work to implement -- and which might need more information or discussion before design decisions can be made (working prototypes are part of this process of course).

I think a focus on downstream packages is important for our success -- let's make it easy to switch code from dense array syntax to sparse array syntax, and also easy for current users of sparse packages to figure out how to switch to scipy sparse arrays. Having some folks writing PRs for downstream packages while others are writing PRs to convert the sparse matrix to sparse array interface and having those two groups talking during the process may be an effective approach.

Some more specific goals (but not really very specific actually) I'd like to see:

Is this the kind of thing people are thinking of? What else?

ivirshup commented 1 year ago

developing a 1-d sparse API (which may be needed for parts of the previous 2 bullets)

This is a really good point. It could definitely be difficult to integrate with new downstream libraries without this. Do we expect any major hurdles here? I would imaging the indexing code would essentially be factored out from the existing functions.

downstream packages

I'm not actually completley sure which packages are represented at this event. Was there a list for this somewhere?


other potential topics

dschult commented 1 year ago

When you refer to "Array API", are you talking about NEP 47? Any other places to look?

Interoperability and performance are important too -- they can sometimes be hard if the other libraries don't make it easy. But this is a big part of making it easy for people currently using other packages to figure out how to use sparse arrays. :)

ivirshup commented 1 year ago

Sorry about the delayed response! Had a hackathon last week, so also missed the second sparse summit. Were there notes for that floating around somewhere?

@dschult yep, I do mean that NEP / https://data-apis.org/array-api/latest/. I'm wondering if this is going to be required by downstream libraries like dask/ xarray

perimosocordiae commented 1 year ago

I just opened https://github.com/scipy/scipy/pull/18440 to "invert" the hierarchy between spmatrix and _sparray. It was mostly a mechanical change, so hopefully we can get that merged before the summit and have a clean starting point to build from.

ivirshup commented 1 year ago

Saw some recent activity on sparse array support over on xarray:

perimosocordiae commented 1 year ago

As of just now, sparse arrays are the base type for scipy.sparse, with spmatrix defined as a thin wrapper around scipy.sparse._sparray. There are plenty of cleanups and improvements to be made still, but we're moving in the right direction.

jjerphan commented 1 year ago

Hi all,

How would you like to organise the rest of the work? Should we distribute remaining tasks we have identified among ourselves?