interpretml / interpret

Fit interpretable models. Explain blackbox machine learning.
https://interpret.ml/docs
MIT License
6.2k stars 726 forks source link

Categorical features in PartialDependence()? #217

Open mbritton-muso opened 3 years ago

mbritton-muso commented 3 years ago

I am trying to use the PartialDependence functionality to analyze partial dependence for a data set that includes categorical features (for example, this breast cancer data). When I call PartialDependence(), even if I specify the feature_types parameter, I get an error like ValueError: could not convert string to float: '20-29' (see stack trace below). It looks like PartialDependence handles my feature_types parameter but does not modify the data at all? For reference I am generating PDPs for a list of PyCaret models (which have their own automatic feature type detection), and so I'm trying to avoid re-encoding the categorical columns.

/usr/local/lib/python3.7/dist-packages/interpret/blackbox/partialdependence.py in __init__(self, predict_fn, data, sampler, feature_names, feature_types, num_points, std_coef)
     43             data, None, feature_names, feature_types
     44         )
---> 45         self.predict_fn = unify_predict_fn(predict_fn, self.data)
     46         self.num_points = num_points
     47         self.std_coef = std_coef

/usr/local/lib/python3.7/dist-packages/interpret/utils/all.py in unify_predict_fn(predict_fn, X)
    210 
    211 def unify_predict_fn(predict_fn, X):
--> 212     predictions = predict_fn(X[:1])
    213     if predictions.ndim == 2:
    214         new_predict_fn = lambda x: predict_fn(x)[:, 1]  # noqa: E731

/usr/local/lib/python3.7/dist-packages/sklearn/neighbors/_classification.py in predict_proba(self, X)
    215             by lexicographic order.
    216         """
--> 217         X = check_array(X, accept_sparse='csr')
    218 
    219         neigh_dist, neigh_ind = self.kneighbors(X)

/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
     70                           FutureWarning)
     71         kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 72         return f(**kwargs)
     73     return inner_f
     74 

/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
    596                     array = array.astype(dtype, casting="unsafe", copy=False)
    597                 else:
--> 598                     array = np.asarray(array, order=order, dtype=dtype)
    599             except ComplexWarning:
    600                 raise ValueError("Complex data not supported\n"

/usr/local/lib/python3.7/dist-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
     81 
     82     """
---> 83     return array(a, dtype, copy=False, order=order)
     84 
     85 

ValueError: could not convert string to float: 'no'
paulbkoch commented 1 year ago

This might be related to issue https://github.com/interpretml/interpret/issues/322 even though the error reported is different. The error there suggests using a Pandas dataframe, so that could be one thing to try here. In any case, this is something we need to fix.