interpretml / interpret

Fit interpretable models. Explain blackbox machine learning.
https://interpret.ml/docs
MIT License
6.04k stars 714 forks source link

MAINT: use pydantic for parameter validation #514

Open DerWeh opened 2 months ago

DerWeh commented 2 months ago

This is a first proof of concept to use pydantic for the validation of parameters. I used Python 3.10 syntax as it is more concise and used pydantic=2.6.2 without putting much thought into it.

Currently, the fit methods is extremely messy due to all the checks. Of course, a refactoring putting this into a dedicated method would already simplify things. I think using pydantic might be a more elegant solution. This draft is mostly for discussion, whether this is a good idea or not. I tried to be faithful to the checks that were implemented. A disadvantage is that pydantic errors are harder to read. On the plus side, the type annotations are much clearer than having to search through the fit method. It's probably also much easier to maintain the validation.

Currently, I break the private versions of the classifier. Attributes were set, depending on whether the estimator is private or not. Do you think it would be better to define a super class with the common arguments and subclasses with the additional arguments?

Another question is how to best proceed with overwriting the default arguments. Currently, there is a lot of redundancy.

If we want to follow through with this idea, I think some sklearn checks should be enabled again using parametrize_with_checks or check_estimator

paulbkoch commented 2 months ago

Hi @DerWeh -- I like what you're doing here and plan to merge it once it fully replaces the existing checks and passes the tests.

A few thoughts: I think a good minimum version that we should target would be python 3.8 since that's still supported. Thankfully 3.8 supports things like the Literal annotation.

We can add some classes to formalize the DP vs Regular EBM class partition. Someday, I think we might support multi-output EBMs, and one strange type of EBM might be an EBM that has both classification and regression outputs. To create one of the strange beasts, you'd need to directly create an EBMModel object, so I like having the ability to make and use these classes directly. I can't think of any good reason though to mix private and non-private EBMs, so the private vs regular partition can be a first split in the class hierarchy below the EBMModel class.

You probably already saw that I had a test for check_estimator, but disabled it. I think we can probably now increase our minimum scikit-learn version to allow for this check now. We didn't pass all the checks since EBMs are more permissive in what they accept than scikit-learn, and I'd like to preserve our user-friendliness in that respect, so I expect we'll want to continue disabling some of the scikit-learn checks.

DerWeh commented 2 months ago

I think a good minimum version that we should target would be python 3.8 since that's still supported. Thankfully 3.8 supports things like the Literal annotation.

Sure enough, I typically orient myself at NEP-29, but it is a good idea to be a more conservative (after all, now NumPy is already very stable such that using an older NumPy version is typically sufficient). I just wanted to avoid the additional boilerplate for type hints for the prototype, I case we reject the idea altogether.

Someday, I think we might support multi-output EBMs, and one strange type of EBM might be an EBM that has both classification and regression outputs.

Out of curiosity: what do advantage do you expect? I don't really see the synergy for multi-output in EBMs compared to deep networks (or deep trees). Can we expect an advantage during the boosting phase? But this is mostly an academic question. In practice, I am currently mostly interested in more convenient multi-class support. Interactions and post-processing seem to still be lacking.

You probably already saw that I had a test for check_estimator, but disabled it. Yes, I saw it. The PR for the non-private EBMs is ready.

As soon as it is merged, (and when I find time) I can continue working on pydantic.

paulbkoch commented 2 months ago

Someday, I think we might support multi-output EBMs, and one strange type of EBM might be an EBM that has both classification and regression outputs.

Out of curiosity: what do advantage do you expect? I don't really see the synergy for multi-output in EBMs compared to deep networks (or deep trees). Can we expect an advantage during the boosting phase? But this is mostly an academic question. In practice, I am currently mostly interested in more convenient multi-class support. Interactions and post-processing seem to still be lacking.

I think you're right that there will be very little technical benefit. Mostly, I think of it as potentially a nice packaging solution for grouping related predictions, although I might reject the idea if it adds anything more than a little complexity. EBMs do have some benefits in terms of the compactness of the model representation. For example, SoftAtHome, which wrote Ebm2Onnx (https://github.com/interpretml/ebm2onnx) uses EBMs inside their routers, and I also saw a paper where some company was evaluating EBMs to be used inside pacemakers (!!!). Sharing the feature definitions, and bin cuts might be nice in those applications both for the compactness of the model, because it could lower the computation requirements of predictions, and also to speed up predictions.

Probably more likely than boosting multi-output models together would be constructing two separate EBMs and then merging them together as a post process step similarly to how merge_ebm works today. But who knows, maybe someone really will want some super weird objective that includes both classification and regression. I'd like that even if just for the entertainment value of seeing someone use an EBM this way.

paulbkoch commented 2 months ago

But this is mostly an academic question. In practice, I am currently mostly interested in more convenient multi-class support. Interactions and post-processing seem to still be lacking.

We don't allow multi-class interactions in the default API entirely because of the difficulties in visualizing them. You can construct multi-class EBMs with interactions today if you use the measure_interactions function manually. Here's a slightly related example showing how to use it:

https://interpret.ml/docs/python/examples/custom-interactions.html

paulbkoch commented 2 months ago

Hi @DerWeh -- I've been thinking through this question the last few days and here's what I came up with in terms of a hierarchy:

EBMBase -> abstract base class for all EBM models EBMModel(EBMBase) -> instantiable class for normal EBM classifiers/regressors ExplainableBoostingClassifier(EBMModel) ExplainableBoostingRegressor(EBMModel) DPEBMModel(EBMBase) -> instantiable class for differential privacy classifiers/regressors DPExplainableBoostingClassifier(DPEBMModel) DPExplainableBoostingRegressor(DPEBMModel)

Thoughts?

DerWeh commented 2 months ago

Yes, the hierarchy sound very reasonable. Though, I am not quite sure if making EBMBase and abstract base class has any benefit. See the discussion: https://stackoverflow.com/questions/3570796/why-use-abstract-base-classes-in-python

But it doesn't really seem very important to me, so I am fine with anything/


However, now that EBMs have the checks for scikit-learn compatibility, I ran into a bit of a problem: clone doesn't work with parameter validation anymore. clone performs explicit is checks (old_parameter is new_parameter) which fail due to pydantic... pydantic doesn't offer the option to defer the validation, see the rejected request https://github.com/pydantic/pydantic/issues/559. I knew, that scikit-learn urges you to put the validation into the fit method (https://scikit-learn.org/stable/developers/develop.html#instantiation) but I didn't think they are this pedantic. Currently, the only option I see to get through with this PR, is providing our one custom implementation for clone, by providing as custom _clone_parametrized, but this seems somewhat ugly...

What is your opinion @paulbkoch?

paulbkoch commented 2 months ago

Yes, the hierarchy sound very reasonable. Though, I am not quite sure if making EBMBase and abstract base class has any benefit. See the discussion: https://stackoverflow.com/questions/3570796/why-use-abstract-base-classes-in-python

But it doesn't really seem very important to me, so I am fine with anything/

You're probably right that it has very little technical benefit in python. I was thinking of it mainly as a form of documentation to indicate that the EBMBase class should not be instantiated directly and then just using duck typing as we currently are. I haven't written as much python as you have (seemingly) and have never made an abstract class in python, but I'm assuming adding the abstract part is fairly lightweight.

However, now that EBMs have the checks for scikit-learn compatibility, I ran into a bit of a problem: clone doesn't work with parameter validation anymore. clone performs explicit is checks (old_parameter is new_parameter) which fail due to pydantic... pydantic doesn't offer the option to defer the validation, see the rejected request pydantic/pydantic#559. I knew, that scikit-learn urges you to put the validation into the fit method (https://scikit-learn.org/stable/developers/develop.html#instantiation) but I didn't think they are this pedantic. Currently, the only option I see to get through with this PR, is providing our one custom implementation for clone, by providing as custom _clone_parametrized, but this seems somewhat ugly...

What is your opinion @paulbkoch?

Yeah, that does seem a bit ugly. I didn't realize pydantic was unable to use deferred checks. Even if we get the clone function working though, it's not clear if the scikit-learn requirement to defer checks will break anything else now or in the future.

Maybe we should instead move back to option #2 @DerWeh where we move the checks into a separate _validate function? That would at least move them out of the way so that someone reading the fit function would only need to see them if they were specifically interested in the checks. And we'd get to keep the more descriptive error messages too.

DerWeh commented 2 months ago

You're probably right that it has very little technical benefit in python. I was thinking of it mainly as a form of documentation to indicate that the EBMBase class should not be instantiated directly and then just using duck typing as we currently are. I haven't written as much python as you have (seemingly) and have never made an abstract class in python, but I'm assuming adding the abstract part is fairly lightweight.

I agree, it should be lightweight, so I never did a benchmark (or in fact even use it). For documentation purpose, the most important part is probably not to make the class public.

Maybe we should instead move back to option https://github.com/interpretml/interpret/issues/2 @DerWeh where we move the checks into a separate _validate function? That would at least move them out of the way so that someone reading the fit function would only need to see them if they were specifically interested in the checks. And we'd get to keep the more descriptive error messages too.

Sure, having a separate _validate functions is already a first good step. A question to consider is if it is still a good idea to separate EBMs and the DB versions (using your suggested class hierarchy), and if we use the vanilla dataclass offered by Python. It doesn't do any validation, but avoids the boilerplate for writing the __init__ function with all the self.attribute = attribute statements.

Another question is, if I should keep the 'Yoda-conditions' that are currently used for type-checks (example: passthrough < 0.0 or 1.0 < passthrough). Is this the code-style you use? I don't really know why, but my brain doesn't seem to like these statements, and I always have to mentally convert them to variable < constant. Of course, this is a very specific example, where we could use Pythons comparisons not (0.0 <= passthrough <= 1.0) (for some reason this is easy to read for me, but the single 0.0 <= passthrough takes time to digest).

Anyway, I'll play around a bit more in case I find a reasonable (elegant) way to use pydantic for validation. If I don't find a way, I'll at least do the refactor.


For documentation purpose: just gave it a try out of curiosity, and it is __sklearn_clone__ which we have to overwrite. Documentation indicates, however, that scikit-learn>=1.3 is required.

   def __sklearn_clone__(self, *, safe=True):
        """Customized implementation of clone. See :func:`sklearn.base.clone` for details."""
        klass = self.__class__
        new_object_params = self.get_params(deep=False)
        for name, param in new_object_params.items():
            new_object_params[name] = clone(param, safe=False)

        new_object = klass(**new_object_params)
        try:
            new_object._metadata_request = copy.deepcopy(self._metadata_request)
        except AttributeError:
            pass

        # _sklearn_output_config is used by `set_output` to configure the output
        # container of an estimator.
        if hasattr(self, "_sklearn_output_config"):
            new_object._sklearn_output_config = copy.deepcopy(
                self._sklearn_output_config
            )
        return new_object
paulbkoch commented 2 months ago

Sure, having a separate _validate functions is already a first good step. A question to consider is if it is still a good idea to separate EBMs and the DB versions (using your suggested class hierarchy), and if we use the vanilla dataclass offered by Python. It doesn't do any validation, but avoids the boilerplate for writing the __init__ function with all the self.attribute = attribute statements.

Avoiding boilerplate through dataclass sounds like a good idea. I guess this implies having the class hierarchy since we conditionally assign instance attributes based on whether the class is differentially private or not.

Another question is, if I should keep the 'Yoda-conditions' that are currently used for type-checks (example: passthrough < 0.0 or 1.0 < passthrough). Is this the code-style you use? I don't really know why, but my brain doesn't seem to like these statements, and I always have to mentally convert them to variable < constant. Of course, this is a very specific example, where we could use Pythons comparisons not (0.0 <= passthrough <= 1.0) (for some reason this is easy to read for me, but the single 0.0 <= passthrough takes time to digest).

If you want to change the ordering, go for it! I do use Yoda-conditionals in C++ to combat assignment bugs since "if(a = 5)" compiles, but "if(5 = a)" does not, but I don't care about the order in python since assignment isn't legal inside a conditional in python.

Anyway, I'll play around a bit more in case I find a reasonable (elegant) way to use pydantic for validation. If I don't find a way, I'll at least do the refactor.

Sounds good. Thanks again for investigating all these options!

DerWeh commented 2 months ago

Ok, the parameter validation seems to be working now. I managed to find a way based on pydantics TypeAdapter.

The DP versions are currently broken, as I focused only on the non-private versions.

I am currently looking into breaking up the fit method a bit to look for away to avoid as much code duplications as possible when splitting between DP and non-private EBMs. But this will probably take quite some time. The fit methods is quite the monolith, so it takes a lot of time to understand what's going on.

DerWeh commented 2 months ago

@paulbkoch While the code is not nearly as elegant yet as I would like it to be, this is a first working version of the PR. It would be great if you could help me out with the last failing test. The test tests/test_explainers.py::test_spec_synthetic fails for explainer_class = <class 'interpret.greybox._shaptree.ShapTree'>. I don't see how this relates to my changes in the EBM. Can you give me a hint?

If we close the pending reviews, I think this could in principle be merged. But I am happy to incorporate feedback. Like I said, it's still quite a step till EBM has a nice readably code. But eliminated 400 lines is a first step.


On a related note: it seems to me that the CI tests only Python versions from 3.9 till 3.11. In way opinion, at least the oldest and newest versions should be tests (ideally also for the dependencies). So I would propose to test against Python 3.8.

I also recommend using the linter ruff for code quality of the Python code. By now, it contains a large fraction of checks of other tools, making many obsolete. You have to add a configuration enabling additional rules to get the most of it. It enables integration with pre-commit, a language-server, and a VS Code plugin, making it quite simple to use.

paulbkoch commented 2 months ago

Thanks @DerWeh -- I'll take a look at the changes. -400 lines is great 👍

On the test failure, it looks like it might be due to the 0.0.45 release of Shap that occurred 2 days ago. The same test is failing on our develop branch, so it isn't likely due to anything you've done.

ruff and tests for 3.8 all sound good.

DerWeh commented 2 months ago

@paulbkoch I'll answer your comments in detail as soon as I find the time. For now, I just can give a more general remark: Currently, I kept the EBMModel and introduced a new DPEBMModel subclass, which basically pins the attributes that are unnecessary for DP. I fully agree, that a cleaner approach is to have the class structure you mentioned: a common base class and a non-private and a DP subclass. The issue is that at the moment, everything is quite mixed in the fit method, such that it's no particular easy to separate the parts without error-prone code duplications.

The big question is, if we use such a monkey-patch approach as an intermediate step, followed by additional clean-up PRs, or if we go for a total revamp from the start. Currently, I went for the first version, as the PR is already a huge change, and I am an outsider of the project. I don't really want to mess things up, and incremental merges might be easier to merge as the changes are more traceable. But if you want to get it 'right' from the start, I can also try to split non-private and DP right in this PR.

paulbkoch commented 2 months ago

Thanks for the background @DerWeh. We can split it up into 2 PRs if that's easier. The drawback I see is that it temporarily puts the project in a state where we shouldn't put out a release, but I think that's fine in the short term since I get the sense this is something we can plug the hole in quickly. It shouldn't break any existing code if anyone clones it, so that's ok too.

paulbkoch commented 2 months ago

Hi @DerWeh -- I just noticed something a bit concerning. I downloaded the documentation produced from the PR and the documentation for the ExplainableBoostingClassifier and ExplainableBoostingRegressor is missing. Not sure what the problem is yet.

The docs are available to download here: https://dev.azure.com/ms/interpret/_build/results?buildId=552050&view=artifacts&pathAsName=false&type=publishedArtifacts

We use Jupyter Book to make our documentation which sits on top of Sphinx. The ExplainableBoostingClassifier is documented here: https://github.com/interpretml/interpret/blob/develop/docs/interpret/python/api/ExplainableBoostingClassifier.ipynb

I'll have a look.

paulbkoch commented 2 months ago

On the documentation, I think we might just need to add the sphinx-pydantic extension:

https://sphinx-pydantic.readthedocs.io/en/latest/

here: https://github.com/interpretml/interpret/blob/d5ec3fa04db8e2acc5fc0de6ca00ff396852caec/docs/interpret/_config.yml#L22

paulbkoch commented 2 months ago

It would be great if you could help me out with the last failing test. The test tests/test_explainers.py::test_spec_synthetic fails for explainer_class = <class 'interpret.greybox._shaptree.ShapTree'>. I don't see how this relates to my changes in the EBM. Can you give me a hint?

Hi @DerWeh -- I've resolved the ShapTree test that was failing on the develop branch. It was due to a change in the SHAP API.

DerWeh commented 2 months ago

Hi @DerWeh -- I just noticed something a bit concerning. I downloaded the documentation produced from the PR and the documentation for the ExplainableBoostingClassifier and ExplainableBoostingRegressor is missing. Not sure what the problem is yet.

The docs are available to download here: https://dev.azure.com/ms/interpret/_build/results?buildId=552050&view=artifacts&pathAsName=false&type=publishedArtifacts

We use Jupyter Book to make our documentation which sits on top of Sphinx. The ExplainableBoostingClassifier is documented here: https://github.com/interpretml/interpret/blob/develop/docs/interpret/python/api/ExplainableBoostingClassifier.ipynb

I'll have a look.

Sorry, never really used Notebooks to generate a documentation, and I am struggling a bit with the Azure pipeline. However, when I tried to build the documentation locally, the ExplainableBoostingClassifier documentation is empty. Is this your issue? Locally, the problem is that requirements.txt pins typing_extensions==4.5.0. This is not compatible with pydantic causing the building of the documentation to fail. If I update typing_extensions, everything seems to be fine.

Where does the pinning come from? Can I just update the package?

paulbkoch commented 2 months ago

Sorry, never really used Notebooks to generate a documentation, and I am struggling a bit with the Azure pipeline. However, when I tried to build the documentation locally, the ExplainableBoostingClassifier documentation is empty. Is this your issue? Locally, the problem is that requirements.txt pins typing_extensions==4.5.0. This is not compatible with pydantic causing the building of the documentation to fail. If I update typing_extensions, everything seems to be fine.

Where does the pinning come from? Can I just update the package?

Yes, just update the pinning if necessary. I think this is pinned because Jupyter Book changes format frequently and pinning preserves documentation consistency, but we'll need to update these periodically.

paulbkoch commented 2 months ago

If we dropped python 3.8 support, we could eliminate the typing_extensions dependency, right? Seems like a good tradeoff.

paulbkoch commented 2 months ago

I've updated the develop branch pipelines to use python 3.9 for building the docs (and other build tasks).

DerWeh commented 1 month ago

Sorry for the wait, didn't manage to spend much time on it the last week.

Indeed, we only use Annotated from typing_extensions, which is available in the standard library typing 3.9.

However, the import is hard-coded, so I expect, that typing_extensions needs to be installed any which ways. Surprisingly, the docs build locally on my machine using Python3.9 on the requirements.txt file.

Just googled a little, if you want to still support Python3.8 and keep the dependencies as slim as possible, there is an option to make dependencies specific for a python version.

setup(
    ...,
    install_requires=[
        "typing_extensions;python_version<'3.9'",
    ],
)

For course, this would complicate the source-code:

import sys
...
if sys.version_info < (3, 9):
    from typing_extensions import Annotated
else:
    from typing import Annotated

Your thoughts @paulbkoch?


I hope that I can resolve the conflicts with interpret 0.6.0 the next days and continue on this PR.

paulbkoch commented 1 month ago

Hi @DerWeh -- I'm not too fussy on this, but I think of the options given I'd have a preference towards keeping things simple and dropping support for 3.8. Anyone who needs python 3.8 can just stick with interpret 0.6.0 until support for 3.8 is officially ended in a few months.