h2oai / mli-resources

H2O.ai Machine Learning Interpretability Resources
481 stars 131 forks source link

How H2O GBM deal with missing value #7

Closed DSXiangLi closed 6 years ago

DSXiangLi commented 6 years ago

Thanks everyone for this amazing source of model interpretation. I have a question in LOCO, regarding how H2O GBM deals with missing value.

In the markdown, it is said that H2O GBM deals with missing value by following the majority path in the tree. However in the H2O document, I think it means that H2O GBM deals with missing value by treating it as a new category and minimize the loss function. And in testing it will follow the missing value path optimized in training.

Do I have some missunderstanding here?

Thanks, Sandy

jphall663 commented 6 years ago

Hi - good question.

The documentation is referring to training. The text in the notebook is referring to scoring. During scoring, values that were unseen during training are sent down the majority path of each tree. Since missing values were imputed before training the models used in LOCO, missing values are "values that were unseen during training" so they are sent down the majority path.

@DSXiangLi Does that answer your question?

I have been meaning to add more comments to these notebooks, but in the meantime, you may enjoy reading these slightly newer versions of the notebooks with more comments: https://github.com/jphall663/interpretable_machine_learning_with_python

DSXiangLi commented 6 years ago

Thank you! That makes perfect sense!