H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Currently, leaderboard extensions like scoring_time_per_row rely either on the leadeboard frame if present at the time the model is added to the leaderboard, or is computed on demand using the training frame if no leaderboard frame was provided.
This can create errors if the training frame is not available anymore when the user asks for the extended leaderboard.
h3. Proposal
if user provided LB frame -> take the scoring times from there (current logic).
if user didn’t provide any LB but we have CV models -> take the scoring time from scoreMetrics in CV.
We can expose {{_scoring_duration_per_row}} to{{ModelMetric}} instance, and average the CV scoring durations when the model is added to the LB.
Also update documentation to explicitly mention that in this case the scoring duration is not measured on the final model.
if no LB no CV (edge case in AutoML), allow the user to pass an extra frame to {{getLeaderboard}}: this frame could be used not only to compute socring duration, but for consistency also to provide metrics for that frame.
Currently, leaderboard extensions like
scoring_time_per_row
rely either on the leadeboard frame if present at the time the model is added to the leaderboard, or is computed on demand using the training frame if no leaderboard frame was provided. This can create errors if the training frame is not available anymore when the user asks for the extended leaderboard.h3. Proposal