sb-ai-lab / LightAutoML

Fast and customizable framework for automatic ML model creation (AutoML)
https://developers.sber.ru/portal/products/lightautoml
Apache License 2.0
1.08k stars 47 forks source link

Permutation importance calculations of multilevel models #138

Open BELONOVSKII opened 8 months ago

BELONOVSKII commented 8 months ago

🐛 Bug

Problem

Functions calc_one_feat_imp and calc_feats_permutation_imps in lightautoml/automl/presets/utils.py are unable to work with multilevel models.

To Reproduce

Fit a TabularAutoML with multi class Task and call get_feature_scores('accurate', df)

Traceback

KeyError Traceback (most recent call last) Cell In[63], line 1 ----> 1 accurate_fi = automl.get_feature_scores('accurate', test_data, silent=True) 2 accurate_fi.set_index('Feature')['Importance'].plot.bar(figsize = (30, 10), grid = True)

File ~/LightAutoML/lightautoml/automl/presets/tabular_presets.py:837, in TabularAutoML.get_feature_scores(self, calc_method, data, featuresnames, silent) 835 data, = read_data(data, features_names, self.cpu_limit, read_csv_params) 836 used_feats = self.collect_used_feats() --> 837 fi = calc_feats_permutation_imps( 838 self, 839 used_feats, 840 data, 841 self.reader.target, 842 self.task.get_dataset_metric(), 843 silent=silent, 844 ) 845 return fi

File ~/LightAutoML/lightautoml/automl/presets/utils.py:38, in calc_feats_permutation_imps(model, used_feats, data, target, metric, silent) 35 feat_imp = [] 36 for it, f in enumerate(used_feats): 37 feat_imp.append( ---> 38 calc_one_feat_imp( 39 (it + 1, n_used_feats), 40 f, 41 model, 42 data, 43 norm_score, 44 target, 45 metric, 46 silent, 47 ) 48 ) 49 feat_imp = pd.DataFrame(feat_imp, columns=["Feature", "Importance"]) 50 feat_imp = feat_imp.sort_values("Importance", ascending=False).reset_index(drop=True)

File ~/LightAutoML/lightautoml/automl/presets/utils.py:14, in calc_one_feat_imp(iters, feat, model, data, norm_score, target, metric, silent) 13 def calc_one_feat_imp(iters, feat, model, data, norm_score, target, metric, silent): ---> 14 initial_col = data[feat].copy() 15 data[feat] = np.random.permutation(data[feat].values) 17 preds = model.predict(data)

File ~/LAMA_venv3_8/lib/python3.8/site-packages/pandas/core/frame.py:3807, in DataFrame.getitem(self, key) 3805 if self.columns.nlevels > 1: 3806 return self._getitem_multilevel(key) -> 3807 indexer = self.columns.get_loc(key) 3808 if is_integer(indexer): 3809 indexer = [indexer]

File ~/LAMA_venv3_8/lib/python3.8/site-packages/pandas/core/indexes/base.py:3804, in Index.get_loc(self, key, method, tolerance) 3802 return self._engine.get_loc(casted_key) 3803 except KeyError as err: -> 3804 raise KeyError(key) from err 3805 except TypeError: 3806 # If we have a listlike key, _check_indexing_error will raise 3807 # InvalidIndexError. Otherwise we fall through and re-raise 3808 # the TypeError. 3809 self._check_indexing_error(key)

KeyError: 'Lvl_0_Pipe_0_Mod_0_LinearL2_prediction_0'