oegedijk / explainerdashboard

Quickly build Explainable AI dashboards that show the inner workings of so-called "blackbox" machine learning models.
http://explainerdashboard.readthedocs.io
MIT License
2.29k stars 330 forks source link

String categorical values from Lightgbm #198

Open Guidosalimbeni opened 2 years ago

Guidosalimbeni commented 2 years ago

Hello, great tool and library! Wonder if you can point me in the right direction to solve an issue?

what can be a solution to this case? thanks

oegedijk commented 2 years ago

Hi @Guidosalimbeni,

So the if the model is able to handle categorical values then ExplainerDashboard should handle it as well. It does at least for CatBoost, so I assume it should work for lightgbm as well.

Do you have some runnable example code that shows the crash or wrong output?

oegedijk commented 2 years ago

Hi @Guidosalimbeni,

would you able to provide any examples of where this broke?

Dekermanjian commented 1 year ago

Hello, I am running into this issue. Here is the error message that I get: TypeError: '<' not supported between instances of 'float' and 'str'

And here is a reproducible example:

from lightgbm import LGBMClassifier
from sklearn.model_selection import train_test_split

from explainerdashboard import ClassifierExplainer, ExplainerDashboard
import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv")
df[df.select_dtypes("O").columns] = df.select_dtypes("O").astype("category")
df = df[["Survived", "Age", "Sex", "Embarked"]]
y = df.pop("Survived")
X = df
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = LGBMClassifier()
model.fit(X_train, y_train, feature_name='auto', categorical_feature='auto')
explainer = ClassifierExplainer(
                model, X_test, y_test,
                labels=['Not survived', 'Survived'])

db = ExplainerDashboard(explainer, title="Titanic Explainer",
                    whatif=False,
                    shap_interaction=False,
                    decision_trees=False)
db.run(port=8051)
ghost commented 1 year ago

Hello, I am running into this issue. Here is the error message that I get: TypeError: '<' not supported between instances of 'float' and 'str'

And here is a reproducible example:

from lightgbm import LGBMClassifier
from sklearn.model_selection import train_test_split

from explainerdashboard import ClassifierExplainer, ExplainerDashboard
import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv")
df[df.select_dtypes("O").columns] = df.select_dtypes("O").astype("category")
df = df[["Survived", "Age", "Sex", "Embarked"]]
y = df.pop("Survived")
X = df
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = LGBMClassifier()
model.fit(X_train, y_train, feature_name='auto', categorical_feature='auto')
explainer = ClassifierExplainer(
                model, X_test, y_test,
                labels=['Not survived', 'Survived'])

db = ExplainerDashboard(explainer, title="Titanic Explainer",
                    whatif=False,
                    shap_interaction=False,
                    decision_trees=False)
db.run(port=8051)

I am having the same problem, is there a solution?

Guidosalimbeni commented 1 year ago

Yes great, I am still having the same issue.

galievaz commented 1 year ago

Hi,

I have the same issue due to string values in the data, I'd like to create dashboard, as a fitted model used TabularPredictor from Autogluon library is there any solution or update related to this issue?

fjpa121197 commented 1 year ago

I think this issue is more related to the data and how LightGBM is coded.

I stumble upon this error, but it was an error from LightGBM, not explainerdashboard.

Try the following:

df.columns = df.columns.str.translate("".maketrans({"[":"{", "]":"}","<":"^"}))
df.columns[df.columns.str.contains("[\[\]<]")]

This is making sure it removes and targets the error: TypeError: '<' not supported between instances of 'float' and 'str'

Do let me know if that solves your issue.