pbiecek / breakDown

Model Agnostics breakDown plots
https://pbiecek.github.io/breakDown/
103 stars 16 forks source link

problem with running the break plots on sparse data #31

Open dokato opened 4 years ago

dokato commented 4 years ago

Hi, recently I tried to run the breakdown plots on random forest model (ranger from caret package) trained on sparse data (TFiDF matrix). The model is not doing really good job, but still...

When using DALEX package and after creating explainer without any problems I got for this call:

variable_attribution(rf_explainer,
                               new_observation = x_df_test[ind_to_check, ],
                               type = "break_down")

the following error:

Error in `[.data.frame`(out, , obsLevels, drop = FALSE) : 
  undefined columns selected

Then, I switched to this breakDown package. First of all, after calling it like this:

broken(rf_mod, x_df_test[ind_to_check, ])

It tells me that:

Error in "data.frame" %in% class(data) : 
  argument "data" is missing, with no default

Thus, I changed my call to:

broken(rf_mod, x_df_test[ind_to_check, ], data = x_df_test)

and this time:

Error in yhats[[which.max(yhats_diff)]] : 
  attempt to select less than one element in get1index

The whole code is here: https://github.com/CaRdiffR/tidy_thursdays/blob/master/april_30_2020/predict_gross_clf.R

Strangely, it worked well on exactly same pipeline but with a regression problem.

I use R 4.0.0 and latest version of the DALTEX, breakDown packages.

Might be related to #29 .