Does EBM support weighted datasets?

flippercy commented 3 years ago

Hi @interpret-ml:

Does EBM support weighted datasets now? I see discussion on this topic last year but not sure whether this feature has been added.

Appreciate your help!

interpret-ml commented 3 years ago

Hi @flippercy,

Sample weights are now supported in our develop branch, and will be pushed to PyPI in the next release (which should happen in the next week or so, if all goes well).

If you'd like to install our pre-release code (fair warning -- might still have some small bugs), you can clone the repo and do a local install:

git clone https://github.com/interpretml/interpret.git
cd interpret
./build.sh   # or build.bat on Windows -- requires free VS compiler for Windows
cd python/interpret-core
pip install -e .
cd ../interpret
pip install -e .

We'll also update this thread when the release is publicly available, at which point you can pull it directly from pip/conda.

-InterpretML Team

flippercy commented 3 years ago

@interpret-ml Thank you! One more question: will monotonic constraints be included in the next release, too? I've seen some heated discussion on this topic earlier this year. It is a feature strongly preferred by the industry I am working in.

Best Regards,

interpret-ml commented 3 years ago

Unfortunately training time monotonic constraints didn't make it into this release, though it is on our backlog. For now, we recommend post-processing EBMs to enforce monotonicity on individual features. We have some guidance and sample code on how to do that here: https://github.com/interpretml/interpret/issues/184#issuecomment-822844554

xiaohk commented 3 years ago

Hey @flippercy, I found monotonicity very important in some high-stake domains. I am interested in contributing to EBM + monotonicity, and I would love to learn more about how it would be used in practice :)

Could you please share more detail about your use case for monotonic EBM? Is this issue (https://github.com/interpretml/interpret/issues/184) you were referring to ("heated discussion this year")? Thank you! (๑•̀ㅂ•́)و✧

flippercy commented 3 years ago

Hi @interpret-ml:

Thank you for the response! Unfortunately post processing of monotonicity is not the best solution for my request so I'd rather wait for the future release with this feature.

flippercy commented 3 years ago

Hi @xiaohk:

Thank you for the response and sorry for the delay!

Yes I was referring to #184 . Usually monotonicity is required by legal and compliance for the implementation of a model in certain industries. For example, if someone applies for a credit card, the issuer of the card will use a predictive model, either a traditional logistic regression model or a machine learning model, to measure the risk of the applicant and make a decision of approval / decline. If the applicant is declined, the issuer must send him/her a letter later informing the main reason(s) picked by the model for the rejection. In this case, the model is expected to be monotonic because if not, then for a certain variable X, some applicants might be rejected because "X is too high" while others because "X is too low". This will cause big confusion among consumers as well as regulators.

Let me know if there are any questions.

Thank you.

xiaohk commented 3 years ago

Hi @flippercy:

Thank you so much for your reply! The credit card example makes a lot of sense to me. Similar to the solution mentioned in #184, I am looking at ways to post-process the model to enforce monotonicity.

Is there a reason why you prefer adding monotonicity as a training constraint to post-processing? I guess the first option might give higher accuracy (need some experiment to validate), but the latter seems to be more flexible.

flippercy commented 3 years ago

Hi @xiaohk:

I prefer adding monotonicity as a training constraint because I'd rather let the algorithm understand this requirement when building the model. Based on my experience with real data, monotonicity sometimes works as a type of regularization in training, reducing overfitting of the final model. Moreover, all other algorithms, such as lightGBM and xgboost, use this feature as a training constraint so I want EBM to be consistent with others in this aspect, making it easy for model comparison and selection.

Thank you.

xiaohk commented 3 years ago

Thank you so much @flippercy!

I can see how adding monotonic constraint can be used as a type of regularization during training. It is also a great point that it would ease the comparison between EBM and other popular ML models that support monotonic fitting natively.

My concern for monotonic constraint during training is that by forcing the model to be monotonic on feature A, the model might learn to be overly biased on feature B that is correlated with feature A (to work around the constraint). In the end, although the model is monotonic on feature A, it could pick up many undesired and hidden behavior on other features. It probably makes more sense to force the model to be monotonic on all features. I also see similar concerns in discussion from https://github.com/interpretml/interpret/issues/184#issuecomment-822702385.

I will think more about it; I will ping you if I have any updates ٩( ᐛ )و٩( ᐖ )۶

SoulEvill commented 3 years ago

marked - very interesting discussion, keep us posted if there is any new updates!

interpret-ml commented 3 years ago

(Reposting update from issue #62): The latest release of interpret (0.2.5) now has support for sample weights in ExplainableBoostingMachines.

You can pass in positive floating point weights to the new sample_weight parameter of the ebm.fit() call. sample_weight should be the exact same shape and dimension as y -- one weight per sample. Here's a quick usage example:


from interpret.glassbox import ExplainableBoostingRegressor

ebm = ExplainableBoostingRegressor()
ebm.fit(X, y, sample_weight=w)

You can also see more in our documentation: https://interpret.ml/docs/ebm.html#explainableboostingclassifier

To upgrade interpret using pip: pip install -U interpret

Let us know if you run into any issues! -InterpretML Team

interpret-ml commented 3 years ago

Closing this issue since weights have made their way to pypi. We'll continue to track the monotonicity topic in issue https://github.com/interpretml/interpret/issues/184#issuecomment-822844554

-InterpretML team

xiaohk commented 3 years ago

Hey @SoulEvill and @flippercy, thank you so much for using Interpret! I am Jay Wang, a research intern at the InterpretML team. We are developing a new visualization tool for EBM and recruiting participants for a user study (see #283 for more details).

We think you are a good fit for this paid user study! If you are interested, you can sign up with the link in #283. Let me know if you have any question. Thank you!

interpretml / interpret

Does EBM support weighted datasets? #237