interpretml / interpret

Fit interpretable models. Explain blackbox machine learning.
https://interpret.ml/docs
MIT License
6.28k stars 729 forks source link

Explore the data with continuous output and category input #540

Open Vu1992 opened 6 months ago

Vu1992 commented 6 months ago

Hi,

Thank for your great work. I have one question regard to the Explore data. Is it possible to use the following code to explain the continuous output and category input in Explore the data:

marginal = Marginal(names).explain_data(X_train, y_train, name='Train Data')
show(marginal)

When i try to use the above code, they return with Type error: Unable to do the formular for 'str'

paulbkoch commented 5 months ago

Hi @Vu1992 -- It should handle continuous output and category input. I don't see that error message in our repo or on the internet. Can you include a stack trace? Also, is the data public?

Vu1992 commented 5 months ago

Hi @paulbkoch ,

Thank for your reply. Unfortunately that the data is private, but i can show you what i'm trying to do. I have a dataframe and do the following step with df is my data as a table. A=df[['BRANCH']] ; B=df[['Gross_Incurred']]; names=['BRANCH'] So basically A and B have the value as in the image bellow image image Then I use your code for Data explorer marginal = Marginal(names).explain_data(A, B, name='Train Data'); show(marginal) Then python comeback to me with Type Error: unsupported operand type(s) for -: 'str' and 'str

paulbkoch commented 5 months ago

I tried to replicate this with the following code:

import numpy as np
import pandas as pd
from interpret.data import Marginal
from interpret import show
names=['BRANCH']
A = pd.DataFrame()
A["BRANCH"] = pd.Series(np.array(['VC', 'VC', 'MS', 'VH'], dtype=np.str_))
B = pd.DataFrame()
B["Gross_Incurred"] = pd.Series(np.array([18000000.0, 36200000000.0, 0.0, -50000000.0], dtype=float))
marginal = Marginal(names).explain_data(A, B, name='Train Data'); show(marginal)

My example works though. Any idea what could be different?

Vu1992 commented 5 months ago

Thank for your help. I don't know what have gone wrong last time but now i tried again it work but the graph do not change when i change to Type Categorical even in your replication. when i add continuous variable, it show like this image but when i want to see the categorical variable, nothing change image