Open exalate-issue-sync[bot] opened 1 year ago
Sebastien Poirier commented: Now that [https://h2oai.atlassian.net/browse/PUBDEV-7481|https://h2oai.atlassian.net/browse/PUBDEV-7481|smart-link] is resolved, we should be able to enable multiple alphas in the default metalearner and comment out this:
{code:java} //parms._alpha = new double[] {0.0, 0.25, 0.5, 0.75, 1.0};{code}
I have some concerns regarding the training duration for the SEs with this though: we already now that SE take a long time on some datasets, and adding more alphas will even slow it down, so we need to come up with a decision about if/how to include SEs in the global runtime constraint.
Erin LeDell commented: [~accountid:5b153fb1b0d76456f36daced] Though it would produce better models, I don’t know that we need to do an alpha search by default (but we should try to find a better single alpha to use instead of 0.5 which I don’t think is strong enough). Once we enable the different presets/modes in AutoML, then an alpha search could be a used in the ‘compete’ mode. Or if it’s really working well and doesn’t take too much time, we can make alpha search a default in regular mode too.
Regarding SE runtime topic: One option is to find a way of estimate SE time (so we can include it in the global runtime constraint), either just a single estimate to be used across all datasets (e.g. 7% of global runtime reserved for SE), or we can make a formula that is dataset/resource dependent.
JIRA Issue Details
Jira Issue: PUBDEV-7991 Assignee: UNASSIGNED Reporter: Erin LeDell State: Open Fix Version: Backlog Attachments: N/A Development PRs: N/A
We will need to test this out / benchmark this, however I think we would benefit by adding more L1-regularization to the default Stacked Ensemble GLM metalearner, because the “bad” base models are not getting zero-ed out enough in some cases.
Right now we use the default in our GLM, which is alpha = 0.5. I think we should try 1 (Full Lasso) and also a few more values closer to 1. This also produces a more efficient ensemble (with fewer active base models). We can also consider making alpha dynamic, and based on the number of base learners (more learners → higher alpha).
Alternatively, we can make the default metalearner do a grid search over alpha (or we can just do that grid search only in AutoML Stacked Ensembles…)
alpha:
Distribution of regularization between the L1 (Lasso) and L2 (Ridge) penalties. A value of 1 for alpha represents Lasso regression, a value of 0 produces Ridge regression, and anything in between specifies the amount of mixing between the two. Default value of alpha is 0 when SOLVER = 'L-BFGS'; 0.5 otherwise.