interpretml / interpret

Fit interpretable models. Explain blackbox machine learning.
https://interpret.ml/docs
MIT License
6.04k stars 715 forks source link

Increase compatibility of EBM with scikit-learn #518

Closed DerWeh closed 2 months ago

DerWeh commented 2 months ago

This PR adds the estimator checks for scikit-learn compatibility.

We add the following compatibilities:

DerWeh commented 2 months ago

We still fail some scikit-learn tests, where I am not sure if we intentionally deviate. These are open TODOs/FIXME in skip_sklearn.

I didn't remove the old (currently skipped) test_scikit_learn_compatibility. Can this test be removed, or is there something important which should be added?

The private versions aren't test at the moment. They fail with some cryptic errors which I do not understand. Sorry, but I don't really understand the DP versions, so I cannot easily fix this part.

I think we need these tests back, before continuing to work on #514, as #514 would mean a rather drastic refactor.

paulbkoch commented 2 months ago

We still fail some scikit-learn tests, where I am not sure if we intentionally deviate. These are open TODOs/FIXME in skip_sklearn.

I didn't remove the old (currently skipped) test_scikit_learn_compatibility. Can this test be removed, or is there something important which should be added?

The private versions aren't test at the moment. They fail with some cryptic errors which I do not understand. Sorry, but I don't really understand the DP versions, so I cannot easily fix this part.

I think we need these tests back, before continuing to work on #514, as #514 would mean a rather drastic refactor.

On the scikit-learn tests:

We can remove the old test_scikit_learn_compatibility test that wasn't enabled.

I can have a look at the DP tests once the rest of this PR is in.

DerWeh commented 2 months ago
  • EBMs do support strings. We support both nominal and ordinal categoricals because we want to be able to show more information in the UI about these kinds of features. Other packages make you convert a string like "low", "medium" and "high" into 1, 2, 3, but we want the original strings that have more innate human interpretability.

I updated the tags, indicating this, which allowed re-enabling the test. Please take a look if the tags seem suitable now. Check ExplainableBoostingClassifier._get_tags() (and the others) and compare to the estimator tags. We could also use this to skip the tests, but the documentation warns against this.

  • Is the 1.0 != "1.0" error on y values or X values?

The scikit-learn fits floating point y=[1.0,...], and complains that the predictions differ, as EBM predicts string labels pred=["1.0", ...]. Thus, I disabled the test. Probably the upstream test should be changed, EBM behavior seems the more reasonable approach to me. .

  • we do accept 1D X arrays. I'm a little surprised scikit-learn doesn't too actually for predict since it's handy to slice off a single sample for prediction and sometimes people (me especially) forget to keep the same shape. I figure if the user made a mistake and is missing a dimension, then they'll get an exception when the number of samples doesn't match the number in y. Now that I think of it, probably scikit-learn doesn't allow this since they have some methods that only accept an X, so for them it could sometimes be ambiguous if the one dimension was for samples or for features, but for us this isn't an issue since during fitting we always have a y to corroborate the number of samples, or we have a fitted model that can corroborate the number of features.

I guess the ambiguity is whether we have a single sample and the dimension represents features, or a single feature and the dimension represents samples. Anyway, test is skipped.

We can remove the old test_scikit_learn_compatibility test that wasn't enabled.

Done

I can have a look at the DP tests once the rest of this PR is in.

Then I'll leave this up to you.


I think the PR should be good to merge now, or do you want me to check out if we can be more permissive about older scikit-learn versions? As I said, you can validate that the correct tags are set.

paulbkoch commented 2 months ago
  • Is the 1.0 != "1.0" error on y values or X values?

The scikit-learn fits floating point y=[1.0,...], and complains that the predictions differ, as EBM predicts string labels pred=["1.0", ...]. Thus, I disabled the test. Probably the upstream test should be changed, EBM behavior seems the more reasonable approach to me. .

Ah, now I remember about this, I converted y floats to strings because JSON doesn't differentiate between integers and floats. It just has a number type, which are always floats. I wanted to be able to serialize to JSON and back to a python object. Without indicating the original datatype it would be ambiguous whether the JSON value 1 should be restored in python as an integer or a float type. I figured integers would be more common than floats for y, so I converted floats to strings.

If we want to support both integers and floats, we can do it by including an "output_type" field for classifiers in the JSON to differentiate between integers and floats here: https://github.com/interpretml/interpret/blob/0cf3f2f3c4ea7b4cf498e732295defdb712e4ad5/python/interpret-core/interpret/glassbox/_ebm/_json.py#L79

I think the PR should be good to merge now, or do you want me to check out if we can be more permissive about older scikit-learn versions? As I said, you can validate that the correct tags are set.

If the rest of the PR is ready I can just merge it first and check afterwards. It's probably fine to use 0.24 but you never know what old environment someone might want to install interpret in, especially a cloud based one where the user might not even be able to upgrade.

I'll review now.

DerWeh commented 2 months ago

If the rest of the PR is ready I can just merge it first and check afterwards. It's probably fine to use 0.24 but you never know what old environment someone might want to install interpret in, especially a cloud based one where the user might not even be able to upgrade.

Just tried to test with an old version, which failed as some test I skip didn't exist in old version... If we compare the functions names instead of functions itself, we can alleviate this problem.

But more importantly: we actually don't need to bump the version at all, it suffices to increase the scikit-learn version in the testing dependencies. It shouldn't be necessary to bother people just using it.

paulbkoch commented 2 months ago

Thanks @DerWeh -- It's really nice to finally have scikit-learn verification!