Open curlette opened 7 years ago
@curlette Even though you specified k=7
in the MML code defining the number of categories for the random forest, the "training data" in scorecard_training_data
only contains values [0,1,2,3,4]
, which means that BayesDB does not have a mapping for the categories [5, 6, 7]
and thus raises a lookup error when the random forest predicts one of those classes.
It follows that this bug is an instance of #437, where really what the system needs is i.e. a user provided codebook for all the possible nominal values that the model may ever encounter (i.e. for newly incorporated rows).
Perhaps this should return NaN instead, as in the other cases?
A notebook with a minimal working example of this bug reproduced can be found here: http://probcomp-3.csail.mit.edu:9999/notebooks/cgpm_invalid_category_bug_reproduced.ipynb