Closed orchardbirds closed 2 years ago
It was like this at some point, but apparently it was changed recently. I would recommend using one function across the library (like woe_1d).
The most common practice is to have woe=log(%G/%B)
.
For the purpose of the calculations it does not matter (it's just a sign difference), as long as it is consistent throughout the package.
Yeah unfortunately I've found it's more than a sign difference for certain values. Will do more digging
Found the bug @sbjelogr .
In metrics.py
we had python df = pd.concat([X, y], axis=1, ignore_index=True)
on line 32.
If X comes from train_test_split, it will have bad indices which means the df doesn't concatenate properly. We just need to add:
X = X.copy().reset_index(drop=True)
on line 22. Fixing tests now, then will commit
Nice work !
The
bucket_table
function is reporting a WoE different to thewoe_1d
function. Which is correct?The WoE function should be done in one place.