"monotonize" shifts scores significantly

tszyan-bain commented 1 year ago

Hi,

Thank you for the nice work.

I observed a behavior I am not sure I understand when using the monotonize on a feature after training the model. While the feature becomes monotonic afterwards, its score value is also significantly shifted.

I am guessing it is coming from this weighted average subtraction in the implementation. But should this subtraction also take into account the original average of y?

So it may be something like

y -= np.average(y - y_original, weights=weights)

in which I use _yoriginal to denote the unmodified scores (since y is being overwritten).

paulbkoch commented 1 year ago

Hi @tszyan-bain -- Thanks. Glad to hear you're enjoying using the package.

Is it possible to include an image of the before and after graphs (including the histograms)? How much does the monotonization change the log loss, or RMSE?

I think of monotonization as falling into 2 scenarios. In the first scenario we're monotonizing a feature that represents something truly monotonic. Because of the nature of boosting, our graphs can have added noise which monotonization can partly help remove. In this case we'd expect our graphs to shift just a tiny bit, and for our metrics to change only a little. Potentially this operation could even improve the model since we've removed some of the noise.

In the second scenario we're trying to monotonize something that really doesn't want to be monotonic. Let's say we had a scenario where 90% of the samples were on the right side of a graph. In this case the isotonic regression that we use to apply monotonization will not want to shift the denser region very much, so it will tend to shift the sparser left side more if required.

Following the monotonization of the feature, we need to re-center the graph in some way. As with other GAM implementations, our graphs are centered around the mean such that each feature contributes on average of zero to the prediction. This property allows the intercept to be the mean predicted value and allows the features to be considered independently from all the others. This re-centering operation will of course shift the overall graph.

There's a question about whether we should then throw away or incorporate any shift to the intercept that we get from shifting the graph back to having a mean of zero. I chose to throw it away since I felt that monotonization by default should not affect the model's base prediction. Or in other words, if the mean prediction before applying an edit was 7, then I figured a good default would be for the model to predict a mean of 7 after the edit as well. To be honest though, it wasn't 100% clear to me which option would be better. We've been discussing potentially including a parameter that allows the caller to decide how much of the change in the intercept to allow through.

tszyan-bain commented 1 year ago

Thank you for the quick response. I agree with your intuition that the process of monotonization should not change the overall average. However I guess the original histogram does not necessarily be centered around 0?

Here is the histogram before monotonize: the score values are mostly positive newplot (13)

After monotonize: newplot (14)

You can see while the overall shape of the histogram does not change, the score has been shifted downwards.

When the (weighted) score from the original feature (before monotonization) is not subtracted, the resulting histogram is monotonic and at a similar score level as the original one: newplot (15)

tszyan-bain commented 1 year ago

I think the issue not shown above is a huge mass of this feature is at the nan bin (the zero bin). Since monotonize only looks at the bins except the first and last, the mean predicted value of these bins are no longer close to 0.

paulbkoch commented 1 year ago

Thanks @tszyan-bain, you're right! The existing monotonize function does not handle missing values properly when invoked on a feature that has missing values. For features without missing values it should work properly. I've fixed the issue and the updated function will go out in our next release. I've also added a new "passthrough" parameter to allow changes to the model's intercept from the monotonization process (set by default to not change the intercept). Here's an equivalent function you can use in the meantime. Please let us know if this fixes the issue for you.

from sklearn.isotonic import IsotonicRegression
def monotonize(ebm, term, increasing="auto", passthrough=0.0):
    if isinstance(term, str):
        term = ebm.term_names_.index(term)
    features = ebm.term_features_[term]
    term_scores = ebm.term_scores_.copy()
    scores = term_scores[term].copy()
    y = scores[1:-1]
    x = np.arange(len(y), dtype=np.int64)
    all_weights = ebm.bin_weights_[term]
    weights = all_weights[1:-1]
    original_mean = np.average(y, weights=weights)
    ir = IsotonicRegression(increasing=increasing)
    y = ir.fit_transform(x, y, sample_weight=weights)
    result_mean = np.average(y, weights=weights)
    change = (original_mean - result_mean) * (1.0 - passthrough)
    y += change
    scores[1:-1] = y
    if 0.0 < passthrough:
        mean = np.average(scores, weights=all_weights)
        scores -= mean
        ebm.intercept_ += mean
    term_scores[term] = scores
    ebm.term_scores_ = term_scores
    bagged_scores = ebm.bagged_scores_.copy()
    standard_deviations = ebm.standard_deviations_.copy()
    bagged_scores[term] = None
    standard_deviations[term] = None
    ebm.bagged_scores_ = bagged_scores
    ebm.standard_deviations_ = standard_deviations
    return ebm

tszyan-bain commented 1 year ago

Thank you very much @paulbkoch

paulbkoch commented 1 year ago

Hi @tszyan-bain -- I assume no news is good news, but just wanted to check in and verify that this solved the problem in a reasonable way. Also, I'm curious if you can share any information regarding how much the monotonization process affected whatever metrics you're using to evaluate the model.

tszyan-bain commented 1 year ago

hi @paulbkoch

sorry for the silence -- in fact your assumption is always and your proposed solution does the trick.

Regarding the metric I can say in my case the change is minimal: in the train set it is slightly worse after the monotonization, but the impact on the validation set can sometimes be better.

It may not be that surprising but I believe such minimal impact is a direct result of how much changes one is making in the monotonization.

paulbkoch commented 1 year ago

Thanks @tszyan-bain -- I was wondering if in fact monotonization had the possibility of improving the model on the validation set, so having a confirmed example where this is the case is very useful.

I have been mulling over whether it would be useful to add post process smoothing functions to the package to allow per-feature control/correction of overfitting. Hearing that post process editing can improve the model in some cases is really helpful towards making that decision. 👍 But this is a discussion for another issue, so closing this issue as completed now.

paulbkoch commented 1 year ago

The latest v0.4.3 release includes this fix, so the temporary function above is no longer needed.

interpretml / interpret

"monotonize" shifts scores significantly #445