rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.27k stars 535 forks source link

Compatibility with scikit-learn API #3125

Open jeremiedbb opened 4 years ago

jeremiedbb commented 4 years ago

Hi,

I tested the scikit-learn tool (check_estimator) to check that an estimator is compatible with the scikit-learn API and all fail because they don't implement estimator tags (they might fail for other reasons but that was the first failure).

Having an estimator pass check_estimator makes sure it can be used in scikit-learn model evaluation and model selection tools. Here is a list about what a estimator needs to be compatible: https://scikit-learn.org/stable/developers/develop.html#rolling-your-own-estimator

The checks are very strict currently but we plan to make it possible to only check the API compatibility in the next release (which still fails for cuml) (https://github.com/scikit-learn/scikit-learn/pull/18582)

Although using scikit-learn model evaulation tools on cuml estimators can't be optimal today (maybe even not possible, not sure) because scikit-learn can't delegate work to cupy, there's ongoing attempts to make it possible, see for instance https://github.com/scikit-learn/scikit-learn/pull/16574, https://github.com/scikit-learn/scikit-learn/pull/17676, https://github.com/scikit-learn/scikit-learn/pull/17744.

In this context, I think it would add a lot of value to cuml estimators to pass API compatibility tests.

cc @ogrisel

dantegd commented 4 years ago

Hi @jeremiedbb, thanks for the issue! We are aware of this indeed, and just started the tags system very recently, see #3113, which is very likely to land this week. Passing the API compatibilty test (at least the less strict upcoming one) is something that is in our radar, so this issue will be great to use for tracking it.

jeremiedbb commented 4 years ago

Ah I did not see this PR, should have looked closer. This is great !

github-actions[bot] commented 3 years ago

This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

beckernick commented 3 years ago

For folks thinking about this issue, I suspect this function which is part of the estimator check will trip us up https://github.com/scikit-learn/scikit-learn/blob/95119c13af77c76e150b753485c662b7c52a41a2/sklearn/utils/estimator_checks.py#L2512

github-actions[bot] commented 3 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.