narwhals-dev / narwhals

Lightweight and extensible compatibility layer between Polars, pandas, cuDF, Modin, and more!
https://narwhals-dev.github.io/narwhals/
MIT License
223 stars 33 forks source link

Research: how would Narwhals work in scikit-learn? #355

Open MarcoGorelli opened 5 days ago

MarcoGorelli commented 5 days ago

Scikit-learn mentioned Narwhals here https://github.com/scikit-learn/scikit-learn/pull/28513#issuecomment-2131226993

Regardless of whether they decide to use it or not, it would be beneficial to check whether Narwhals would at least work in scikit-learn, and whether it could in principle solve the linked issue

@EdAbati - you've contributed to scikit-learn, and you know Narwhals well - perhaps this issue might be of interest to you?

EdAbati commented 3 days ago

Uuuh this can be fun! I contributed to features related to the Array API compatibility lately. But I'll gladly also try to figure out this part of scikit-learn

it would be beneficial to check whether Narwhals would at least work in scikit-learn, and whether it could in principle solve the linked issue

Agreed, let's start having a look at these:

If anyone wants to help, please feel free to comment.

EdAbati commented 3 days ago

Some early thoughts:

MarcoGorelli commented 3 days ago

what do you think, is it a missing feature?

I'd say so, yes! I think we need this one in Altair too. Does implementing it interest you? I think it just requires an extra branch DataFrame__getitem__

what about .copy()/.clone()?

Sure, doesn't hurt to add DataFrame.clone 👍

EdAbati commented 3 days ago

Does implementing it interest you?

Most of the time my answer to the question is "yes". My only problem is time 😅 I will create an issue in case someone else is able to pick it up before me

MarcoGorelli commented 3 days ago

😄 I'll take the getitem one on then so we can propose it to Altair too