Open diegodebrito opened 2 months ago
Hi @guillermo-navas-palencia, wondering if you could check on this. I'm adding a more comprehensive example below:
import pandas as pd
from optbinning import ContinuousOptimalBinning
df = pd.DataFrame({'value': {0: 0.0,
1: 1.0,
2: 2.0,
3: 3.0,
4: 4.0,
5: 5.0,
6: 6.0,
7: 7.0,
8: 8.0,
9: 9.0},
'target': {0: 7.747250464922968,
1: 6.527567693419396,
2: 5.951775031334447,
3: 5.4739748791420855,
4: 5.635028933057227,
5: 5.177333709759795,
6: 5.242660923463983,
7: 4.681195578721209,
8: 4.921130922493046,
9: 4.698432205030768},
})
variable = "target"
optb = ContinuousOptimalBinning(dtype="numerical",
min_bin_size=0.3,
max_bin_size=1.0,
)
optb.fit(df['value'],
df['target'],
)
print(optb.status)
binning_table = optb.binning_table
binning_table.build()
binning_table.plot()
df['num_obs'] = [10] * 10
variable = "target"
optb = ContinuousOptimalBinning(dtype="numerical",
min_bin_size=0.3,
max_bin_size=1.0,
)
optb.fit(df['value'],
df['target'],
sample_weight=df['num_obs']
)
print(optb.status)
binning_table = optb.binning_table
binning_table.build()
binning_table.plot()
df = df.loc[df.index.repeat([10 for i in range(10)])]
variable = "target"
optb = ContinuousOptimalBinning(dtype="numerical",
min_bin_size=0.3,
max_bin_size=1.0,
)
optb.fit(df['value'],
df['target']
)
print(optb.status)
binning_table = optb.binning_table
binning_table.build()
binning_table.plot()
Thanks for your work on this great tool!
The parameters min_bin_size and max_bin_size don't seem to work well when passing sample_weight during fit. The example below produces only one bin, regardless of value for those parameters.
Removing sample_weight from the fit call seems to work properly (you can just comment that out and rerun the example below).
Please let me know if it's my lack of understanding or if I'm using the tool incorrectly.