guillermo-navas-palencia / optbinning

Optimal binning: monotonic binning with constraints. Support batch & stream optimal binning. Scorecard modelling and counterfactual explanations.
http://gnpalencia.org/optbinning/
Apache License 2.0
452 stars 100 forks source link

How to resolve the optbinning issue #328

Open Jagadeesha89 opened 2 months ago

Jagadeesha89 commented 2 months ago
Capture1
          how to resolve this, because previously i ran the code to develop the scorecard its work well. this time i tried with same data but it gets stopped in the fit the binning process for X,y. Kindly help me in this

Originally posted by @Jagadeesha89 in https://github.com/guillermo-navas-palencia/optbinning/issues/326#issuecomment-2244258260

guillermo-navas-palencia commented 2 months ago

I would need much more information to say anything about this problem. You must provide a reproducible example.

Jagadeesha89 commented 2 months ago

This is my code entire code blocks, when i try to fit pass this code "binning_process.fit(X,y)" every time my kernel gets stopped and popup for restart. I tried with python 3.11 version and 3.9 version.

PART3 PARt2 Part1
pangjac commented 2 months ago

How about pip install -U numpy ? Any recent dependent package versions you have upgraded recently?

Jagadeesha89 commented 2 months ago

I am using 1.26.4 numpy package, i tried with python package with 3.9 version and latest version. I am getting same error

Jagadeesha89 commented 2 months ago

if the number of rows in my data frame with in 20 it is working, if the data frame consist more than 20 rows jupyter note book kernel gets died. Please check is there any condition passed accessing the data.

bmreiniger commented 2 months ago

My first guess would be that your machine hasn't enough resources, although with only 20 rows and 15 columns I'd be very surprised. Is the dataset publicly available somewhere? What version of optbinning and cvxpy do you have installed?

Jagadeesha89 commented 2 months ago

Even if i created the manual data with more than 20 rows with 2 columns its not working its not problem with data set. I have installed latest version of optbinning. Check with your end

Jagadeesha89 commented 2 months ago

I just feed the random data of 19 values (age) it was working, when i updated to 23 values error popped-up saying kernel died. below screen shot attached first two screen shot with 19 values remaining two screen shot is with 23 values

Capture1 Capture2 Capture3 Capture4
Jagadeesha89 commented 2 months ago

Any update

bmreiniger commented 2 months ago

Please add that example as code, so others can easily run it themselves. And any package versions or system specifications may also be helpful.

I agree (from the last issue) that this is unlikely to be an OptBinning problem, and if it's system-specific I don't know how much I could help. But at the very least, I'll run your example locally to verify whether it's your system.

Jagadeesha89 commented 2 months ago

Please find the below code which i used to run in my system

import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns pd.set_option('display.max_rows',150) from optbinning import OptimalBinning
from optbinning import Scorecard from optbinning import BinningProcess from optbinning.scorecard import plot_auc_roc, plot_cap, plot_ks from sklearn.linear_model import LogisticRegression from sklearn.pipeline import Pipeline from sklearn.base import BaseEstimator, TransformerMixin

data=pd.read_csv(r"D:\CB_intigration\ScoreCard MFI Business\Smaple_data.csv") Smaple_data - Copy.csv

x=data.drop('Status',axis=1) y=data['Status']

variable_names = x.columns[:].tolist() print(variable_names)

selection_criteria = { "iv": {"min": 0.07, "max": 0.5}, "quality_score": {"min": 0.01} }

binning_process = BinningProcess(variable_names, special_codes=[], selection_criteria=selection_criteria)

estimator = LogisticRegression(fit_intercept=True,solver='lbfgs')

binning_process.fit(x,y)

here i am getting the popup saying kernel appears to have died it will restart automatically.

sample data attached for your refrence.

bmreiniger commented 2 months ago

That runs fine in a Google Colab notebook.

Jagadeesha89 commented 2 months ago

its working in python 3.11.5 version but >3.11.5 version python its throwing an error kernel died. I tried in VS code using virtual environment 3.11.5 python version its working but latest version of python is not supporting