scikit-learn / scikit-learn

scikit-learn: machine learning in Python
https://scikit-learn.org
BSD 3-Clause "New" or "Revised" License
59.3k stars 25.23k forks source link

No complete list of necessary checks for creating new classes in the docs (or anywhere else). #10082

Open voxmenthe opened 6 years ago

voxmenthe commented 6 years ago

@amueller asked me to open this issue.

When creating new estimators using any of the base classes, or inheriting from any of the main sklearn classes, there is no complete list (or any list?) of all of the checks that need to be run through to create a well-formed estimator or sklearn-compatible class. check_array covers a lot of things, but definitely nowhere near all things.

This is important, because the learning curve for contributing new classes and methods to sklearn is unnecessarily steep, and it would be much easier to contribute if there was, in the docs, a list of all the necessary checks that need to be run through for different types of estimators and sklearn classes, and a clear explanation of how and when the different checking utils should be used.

jnothman commented 6 years ago

in practice it will often be useful to start with an example that works and then extend it, don't you think. A complete list is essentially identical to sklearn/utils/estimator_checks.py, though I suppose that could be written out in plain English and with examples of how to conform. This may become a big effort to build and maintain, however...

lesteve commented 6 years ago

@voxmenthe what can be done to improve http://scikit-learn.org/dev/developers/contributing.html#rolling-your-own-estimator?

valentincalomme commented 6 years ago

I totally agree. I am currently trying to create custom objects and I keep running into strange errors that check_estimator raises. Most of the time, it's because I am not writing the correct tests in my classes.

I read the "rolling your own estimator" and even though it's a good start. It is incomplete and not up to date. The most blatant issue that I've seen so far was that the classifier template doesn't even pass the check_estimator test. It took me a few days to realize how to fix it (add a check_classification_targets(y) in fit)

At the very least, it would be great if better error messages could be shown.

jnothman commented 6 years ago

yes, better error messages in check_estimator would be great.

re the example, I suppose there's a difference between what works and is acceptable in order, and what is library quality

On 12 Nov 2017 12:26 pm, "ValentinCalomme" notifications@github.com wrote:

I totally agree. I am currently trying to create custom objects and I keep running into strange errors that check_estimator raises. Most of the time, it's because I am not writing the correct tests in my classes.

I read the "rolling your own estimator" and even though it's a good start. It is incomplete and not up to date. The most blatant issue that I've seen so far was that the classifier template doesn't even pass the check_estimator test. It took me a few days to realize how to fix it (add a check_classification_targets(y) in fit)

At the very least, it would be great if better error messages could be shown.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/scikit-learn/scikit-learn/issues/10082#issuecomment-343706824, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEz64MVBHAuYIsbJYtj_JDKwbM41O1eks5s1kkugaJpZM4QVTV4 .

lesteve commented 6 years ago

I read the "rolling your own estimator" and even though it's a good start. It is incomplete and not up to date. The most blatant issue that I've seen so far was that the classifier template doesn't even pass the check_estimator test. It took me a few days to realize how to fix it (add a check_classification_targets(y) in fit)

@ValentinCalomme you seem to have a good grasp on how to improve the documentation. A PR would be really welcome!