interpretml / interpret

Fit interpretable models. Explain blackbox machine learning.
https://interpret.ml/docs
MIT License
6.22k stars 726 forks source link

Controlling for overfitting #176

Closed hakeemo closed 3 years ago

hakeemo commented 3 years ago

This is more of a question than an issue. It seems that the default setting for my dataset of 10k rows and 45 features results in and overfitted model. Decreasing the number of max_rounds seems to help. What are the recommended ways for avoiding overfitting to the data?

interpret-ml commented 3 years ago

Hi @hakeemo - good question.

Hmm, would need some more detail on the following:

The ways to mitigate overfitting will somewhat depend on the issue you're facing, let us know!

hakeemo commented 3 years ago

Thanks for the response.

interpret-ml commented 3 years ago

If you're using a time-based holdout, then there's a good chance you're working in a non-stationary environment (in this case: the data and its relationships are likely changing over time).

Do you see similar overfitting when you run other learners such as random forest / gradient boosting(default without custom validation) or is it specific to EBM?

You are correct in that we use stratified holdout behind the scenes. This shouldn't be an issue if the train and test are assumed to be drawn from the same parent distribution - but it sounds like it may not be the case here.

hakeemo commented 3 years ago

It isn't specific to EBM. For my problem I would always need a time-based validation set (typically TimeSeriesSplit in sklearn)

interpret-ml commented 3 years ago

That makes sense, we'll make an attempt to get external validation sets in the next release (should be this or next week) and will let you know.

If you're going down the path of domain/shift adaptation (co-variate or otherwise) to train the learner for a later time period, I'm guessing you'd want learner-handled sample weights as well. We're working on it, but there's no concrete ETA.

hakeemo commented 3 years ago

@interpret-ml Those features would definitely help! Thanks for your quick response and for this great package.

sarim-zafar commented 1 year ago

@interpret-ml it has been two years since the last update on this and I can't find any external validation set feature?