Increase compatibility of EBM with scikit-learn

DerWeh commented 2 months ago

This PR adds the estimator checks for scikit-learn compatibility.

We add the following compatibilities:

add a warning if a 2D y input is used, where we drop the last dimension.
we do not use a mutable empty list as default argument
we add a tag that EBMs support NaN inputs in the features
we check for inf in the classes
in case of complex inputs we raise an ValueError instead of a TypeError

DerWeh commented 2 months ago

We still fail some scikit-learn tests, where I am not sure if we intentionally deviate. These are open TODOs/FIXME in skip_sklearn.

I didn't remove the old (currently skipped) test_scikit_learn_compatibility. Can this test be removed, or is there something important which should be added?

The private versions aren't test at the moment. They fail with some cryptic errors which I do not understand. Sorry, but I don't really understand the DP versions, so I cannot easily fix this part.

I think we need these tests back, before continuing to work on #514, as #514 would mean a rather drastic refactor.

paulbkoch commented 2 months ago

We still fail some scikit-learn tests, where I am not sure if we intentionally deviate. These are open TODOs/FIXME in skip_sklearn.

I didn't remove the old (currently skipped) test_scikit_learn_compatibility. Can this test be removed, or is there something important which should be added?

The private versions aren't test at the moment. They fail with some cryptic errors which I do not understand. Sorry, but I don't really understand the DP versions, so I cannot easily fix this part.

I think we need these tests back, before continuing to work on #514, as #514 would mean a rather drastic refactor.

On the scikit-learn tests:

you're right that we currently don't support sample weights of 0. There was some issue that I had with them, so I disabled it temporarily. I think it was a division by zero error if all the samples in a tree leaf had zero weights. I think this is something that we can eventually support, but I want to be careful about re-enabling it.
EBMs do support strings. We support both nominal and ordinal categoricals because we want to be able to show more information in the UI about these kinds of features. Other packages make you convert a string like "low", "medium" and "high" into 1, 2, 3, but we want the original strings that have more innate human interpretability.
fitting zero features should be eventually supported, but I'm not sure if it is today. Fitting zero features can be useful if you just want to know what the intercept would be without any features. For RMSE regression this isn't very useful, but it could be more useful for non-standard objectives. In any case, I'm not too stressed if we don't support this today as long as we do someday.
Is the 1.0 != "1.0" error on y values or X values? We should be treating features with all floating-point numbers as continuous since if we left them as strings then they'd be treated as categoricals, which would be surprising to many people and might not be noticed. That would make them nominals and would lead to unexpected lower accuracy.
we do accept 1D X arrays. I'm a little surprised scikit-learn doesn't too actually for predict since it's handy to slice off a single sample for prediction and sometimes people (me especially) forget to keep the same shape. I figure if the user made a mistake and is missing a dimension, then they'll get an exception when the number of samples doesn't match the number in y. Now that I think of it, probably scikit-learn doesn't allow this since they have some methods that only accept an X, so for them it could sometimes be ambiguous if the one dimension was for samples or for features, but for us this isn't an issue since during fitting we always have a y to corroborate the number of samples, or we have a fitted model that can corroborate the number of features.

We can remove the old test_scikit_learn_compatibility test that wasn't enabled.

I can have a look at the DP tests once the rest of this PR is in.

DerWeh commented 2 months ago

EBMs do support strings. We support both nominal and ordinal categoricals because we want to be able to show more information in the UI about these kinds of features. Other packages make you convert a string like "low", "medium" and "high" into 1, 2, 3, but we want the original strings that have more innate human interpretability.

I updated the tags, indicating this, which allowed re-enabling the test. Please take a look if the tags seem suitable now. Check ExplainableBoostingClassifier._get_tags() (and the others) and compare to the estimator tags. We could also use this to skip the tests, but the documentation warns against this.

Is the 1.0 != "1.0" error on y values or X values?

The scikit-learn fits floating point y=[1.0,...], and complains that the predictions differ, as EBM predicts string labels pred=["1.0", ...]. Thus, I disabled the test. Probably the upstream test should be changed, EBM behavior seems the more reasonable approach to me. .

we do accept 1D X arrays. I'm a little surprised scikit-learn doesn't too actually for predict since it's handy to slice off a single sample for prediction and sometimes people (me especially) forget to keep the same shape. I figure if the user made a mistake and is missing a dimension, then they'll get an exception when the number of samples doesn't match the number in y. Now that I think of it, probably scikit-learn doesn't allow this since they have some methods that only accept an X, so for them it could sometimes be ambiguous if the one dimension was for samples or for features, but for us this isn't an issue since during fitting we always have a y to corroborate the number of samples, or we have a fitted model that can corroborate the number of features.

I guess the ambiguity is whether we have a single sample and the dimension represents features, or a single feature and the dimension represents samples. Anyway, test is skipped.

We can remove the old test_scikit_learn_compatibility test that wasn't enabled.

Done

I can have a look at the DP tests once the rest of this PR is in.

Then I'll leave this up to you.

I think the PR should be good to merge now, or do you want me to check out if we can be more permissive about older scikit-learn versions? As I said, you can validate that the correct tags are set.

paulbkoch commented 2 months ago

Is the 1.0 != "1.0" error on y values or X values?

The scikit-learn fits floating point y=[1.0,...], and complains that the predictions differ, as EBM predicts string labels pred=["1.0", ...]. Thus, I disabled the test. Probably the upstream test should be changed, EBM behavior seems the more reasonable approach to me. .

Ah, now I remember about this, I converted y floats to strings because JSON doesn't differentiate between integers and floats. It just has a number type, which are always floats. I wanted to be able to serialize to JSON and back to a python object. Without indicating the original datatype it would be ambiguous whether the JSON value 1 should be restored in python as an integer or a float type. I figured integers would be more common than floats for y, so I converted floats to strings.

If we want to support both integers and floats, we can do it by including an "output_type" field for classifiers in the JSON to differentiate between integers and floats here: https://github.com/interpretml/interpret/blob/0cf3f2f3c4ea7b4cf498e732295defdb712e4ad5/python/interpret-core/interpret/glassbox/_ebm/_json.py#L79

I think the PR should be good to merge now, or do you want me to check out if we can be more permissive about older scikit-learn versions? As I said, you can validate that the correct tags are set.

If the rest of the PR is ready I can just merge it first and check afterwards. It's probably fine to use 0.24 but you never know what old environment someone might want to install interpret in, especially a cloud based one where the user might not even be able to upgrade.

I'll review now.

DerWeh commented 2 months ago

If the rest of the PR is ready I can just merge it first and check afterwards. It's probably fine to use 0.24 but you never know what old environment someone might want to install interpret in, especially a cloud based one where the user might not even be able to upgrade.

Just tried to test with an old version, which failed as some test I skip didn't exist in old version... If we compare the functions names instead of functions itself, we can alleviate this problem.

But more importantly: we actually don't need to bump the version at all, it suffices to increase the scikit-learn version in the testing dependencies. It shouldn't be necessary to bother people just using it.

paulbkoch commented 2 months ago

Thanks @DerWeh -- It's really nice to finally have scikit-learn verification!

interpretml / interpret

Increase compatibility of EBM with scikit-learn #518