scikit-learn / scikit-learn

scikit-learn: machine learning in Python
https://scikit-learn.org
BSD 3-Clause "New" or "Revised" License
58.49k stars 25.04k forks source link

Add simple `pydata/sparse` dispatch #29031

Open mtsokol opened 2 weeks ago

mtsokol commented 2 weeks ago

Hi!

This PR adds a simple scipy.sparse dispatch for https://github.com/pydata/sparse (transforming pydata/sparse arrays to scipy.sparse matrices).

Same efforts to support pydata/sparse input and convert to scipy.sparse were completed in scipy.sparse.linalg and scipy.sparse.csgraph:

github-actions[bot] commented 2 weeks ago

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 205af1b. Link to the linter CI: here

mtsokol commented 2 weeks ago

I think I need some guidance which specific files I should modify to add sparse test dependency.

betatim commented 2 weeks ago

Thanks for making a PR!

Before working more on the code side I think it would be worth expanding on the use-case and motivation for this change. Maybe there is some document or blog post that explains how this fits into the whole PyData ecosystem. The reason I am saying all this is that adding a new dependency to scikit-learn is one of the hardest things to do. The default answer is "no" - and even if it isn't a straight no, then there are a lot of questions and discussions. So if you have some links to things people can read, find answers and inform the debate that would be great.

glemaitre commented 2 weeks ago

In addition to @betatim, I would mention that we recently add support for scipy sparse arrays in addition to the scipy sparse matrices. From what I foresee in the long term is that the scipy sparse arrays implementation should allow us to remove any specific sparse Python code (not the specialize Cython one).

So I'm wondering if this is wise to add support for another sparse container that go away from this goal while we could request our user to make the conversion before to provide us the data, isn't it?

hameerabbasi commented 1 week ago

Thanks everyone, I've opened a discussion thread over here: https://github.com/scikit-learn/scikit-learn/discussions/29064