KS using .predict_proba() and .score() of Scorecard is different

ds-mauri commented 2 years ago

Hi @guillermo-navas-palencia !

As always, congratulations for the amazing job on the library! I was doing some tests using the Scorecard class and noticed that de KS statistics reported when using the method .predict_proba and .score are differents. I was wondering if that was the expected behaviour. The score transformation wasn't supposed to be a linear transformation of the weights of the logistic model? Thought this shouldn't modify the KS statistics, but I'm not sure.

Best regards,

ds-mauri commented 2 years ago

UPDATE

Indeed, even when computing the AUROC the metric changes if we use the .predict_proba() or .score(). To further investigate that I ploted the bads using the deciles of each "score" along the x-axis. For some reason the bad rate changes for each bin. On the left I used the .predict_proba() method (rescaled to 0-1000) and on the right the .score().

Checking on a customer level we see that for some users the score transformation change their ordenation on the dataset.

Here I take 3 customers as example. Using the .predict_proba() they were ranked as 6th and 3rd decile, respectively. Using the .score(), both was on the 4th decile. Is this the expected behaviour of the Scorecard transformation or it should keep the ordering? I believe this is more a question of the theory behind than an issue of the implementation.

Thanks again!

guillermo-navas-palencia commented 2 years ago

Hi @ds-mauri!

Interesting, I will investigate during the weekend. Could you please specify the version you are using?

ds-mauri commented 2 years ago

Sure! I'm using 0.14.1. Just saw on release notes that was a bugfix on version 0.15.0 on the .score() method. I'll check if it's the case.

[UPDATE] I retreined the models on version 0.15.0 and the described phenomenom keep occurring.

guillermo-navas-palencia commented 2 years ago

Hi @ds-mauri,

The relationship between score and probability is non-linear: np.log((1. / event_rate - 1) * n_event / n_nonevent). See the following example:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

from optbinning import BinningProcess
from optbinning import Scorecard
from optbinning.scorecard import plot_auc_roc, plot_cap, plot_ks

from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression

data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target

binning_process = BinningProcess(data.feature_names)

estimator = LogisticRegression(solver="lbfgs", class_weight="balanced")

scorecard = Scorecard(binning_process=binning_process,
                      estimator=estimator, scaling_method="min_max",
                      scaling_method_params={"min": 300, "max": 850})

scorecard.fit(X, y)

Plots:

y_pred = scorecard.predict_proba(X)[:, 1]
y_score = scorecard.score(X)

plot_ks(y, y_pred)
plot_ks(y, y_score)

Relationship between score and predicted probability

plt.scatter(y_pred, y_score)
plt.show()

Non-linear transformation

n_event = y.sum()
n_nonevent = len(y) - n_event

score = np.log((1. / y_pred - 1) * n_event / n_nonevent)

plt.scatter(score, y_score)
plt.show()

plot_ks(y*-1 + 1, score)

guillermo-navas-palencia commented 2 years ago

The transformation from the score to PD and vice-versa can be found here: https://github.com/guillermo-navas-palencia/optbinning/blob/master/optbinning/binning/transformations.py

ds-mauri commented 2 years ago

Hi @guillermo-navas-palencia !

Thanks a lot for the amazing explanation. Indeed, I made a little bit of confusion using both methods. I said "linear" but I was intending to say "monotonic" instead. In fact, it isn't a linear transformation at all, thanks!

Regarding the concern that the .score method was changing the ranking of the scored instances in comparison with the .predict_proba, it was my mistake in validating the bugfix that you uploaded in version 0.15.0. So, in version 0.15.0 the .score method is perfectly implemented.

Just to make it clear:

as the .score method isn't a linear transformation of the PD, it changes the underlying distribution and this changes the KS (the AUROC, on the other hand, keeps the same)
the ranking of both methods is the same. So, the instance with the worst score also should have the worst PD and so on.

By the way, I didn't complete catch the score transformation np.log((1. / event_rate - 1) * n_event / n_nonevent). This is somehow equivalent to the "classical" method without the scaling parameters?

Best regards,

guillermo-navas-palencia commented 2 years ago

Hi @ds-mauri,

Regarding the score transformation, I wrote the logarithmic transformation to convert PD to WoE as a similar example. Therefore, the score is your expression above replacing woe by np.log((1. / PD - 1) * n_event / n_nonevent).

ds-mauri commented 2 years ago

Oh, it's clear! Thanks again @guillermo-navas-palencia !

guillermo-navas-palencia / optbinning

KS using .predict_proba() and .score() of Scorecard is different #184