Closed deepandas11 closed 2 years ago
Thank you for the issue. I have taken a look at your code and tested it and it appears this error is because of the argument requirements for LightGBM. According to the documentation of LightGBM for the categorical_feature documentation argument, the model only accepts categorical features if they use a int format when using Pandas.
Using the following snippet to convert all string categories to ints for preprocessing will solve your issue!
cats = ['Cabin', 'Embarked', 'Gender', 'Name', 'Parch', 'Pclass', 'SibSp', 'Ticket']
for col in cats:
X[col] = X[col].factorize()[0]
The PowerShap API is used correctly in your snippet!
@JarneVerhaeghe thanks for looking into this. However, I may have an alternate explanation for the documentation on lightgbm
. If the columns in the dataframe are of CategoricalDtype()
and are nominal, one could use the name representation of the cat_features. e.g., simply running the following line verifies that:
lgb_cl = LGBMClassifier(random_state=42, n_estimators=10, cat_features=cats)
lgb_cl.fit(X, y)
I'm facing a weird error when using a LightGBM model as the underlying model with the selector. I could find a simple repro using the titanic dataset:
X - bug_features.csv y - bug_label.csv
Categorical features and Data types info
Code snippet used to fit the selector:
Traceback
``` --------------------------------------------------------------------------- ValueError Traceback (most recent call last) Input In [19], inLet me know if I am using the API incorrectly, or missing an argument. I tried passing the cat features list into the fit call as a kwarg, but didn't help either.
Library details: