ing-bank / skorecard

scikit-learn compatible tools for building credit risk acceptance models
https://ing-bank.github.io/skorecard/
MIT License
84 stars 23 forks source link

Align WoE #72

Closed orchardbirds closed 2 years ago

orchardbirds commented 2 years ago

The bucket_table function is reporting a WoE different to the woe_1d function. Which is correct?

The WoE function should be done in one place.

sbjelogr commented 2 years ago

It was like this at some point, but apparently it was changed recently. I would recommend using one function across the library (like woe_1d).

The most common practice is to have woe=log(%G/%B). For the purpose of the calculations it does not matter (it's just a sign difference), as long as it is consistent throughout the package.

orchardbirds commented 2 years ago

Yeah unfortunately I've found it's more than a sign difference for certain values. Will do more digging

orchardbirds commented 2 years ago

Found the bug @sbjelogr .

In metrics.py we had python df = pd.concat([X, y], axis=1, ignore_index=True) on line 32.

If X comes from train_test_split, it will have bad indices which means the df doesn't concatenate properly. We just need to add:


X = X.copy().reset_index(drop=True)

on line 22. Fixing tests now, then will commit

timvink commented 2 years ago

Nice work !