Closed hannah-tillman closed 4 years ago
@angela0xdata Can you remind me where the text for the stacking tutorial page is pulled from? It looks like the README.md?
@hannah-tillman It looks like the code and some text in the README.md file is what needs to be changed, not the R script.
If this is going to be referenced widely, then it might make sense to just remove all the old h2oEnsmeble stuff from this folder -- we should find a place for it, maybe in the ./h2o-3/h2o-r/ensemble
folder in the h2o-3 repo for now? (I should really move that stuff out of there and into it's own repo since it's very old at this point -- there's a JIRA for this).
@ledell oh, you're right! The file is here: https://github.com/h2oai/h2o-tutorials/blob/master/tutorials/ensembles-stacking/README.md
It looks like you deleted a lot from the readme. I think (unless I'm wrong), the only real changes that needed to be done are:
- to make this work with ensemble <- h2o.stackedEnsemble instead of fit <- h2o.ensemble
- removed text about this being implemented as a separate package
- change the path for loading the test and train datasets to point to the smalldata folder instead of erin's folder
See the Stacked Ensemble chapter in the user guide for more info.
From what I understood from messaging @ledell was that I was to remove mention of the old package (minus some of the background information), rewrite to not talk about the wrapper functions and stuff specific to the old implementation (which was quite a bit of the old example), and switch the code example over to the one in the User Guide. Switching the example required a lot of rewriting at the bottom (especially with the explanations). I did try to add back in some of the explanations from the previous example. If there's more I should add back in, please let me know!
@hannah-tillman ahh, I guess I either wasn't aware or I have just forgotten about that. I'll let @ledell weigh in 😬
@hannah-tillman Can you update the Stacked Ensemble user guide code to use the code snippet above? There's a few edits... Thank you!
One thing I changed (which you might want to change again) is the path for the two datasets. I noticed you had another location for the data -- The results are for the 10k training set and the one you had is for the 5k training set. If you can locate a 10k version of this data in that S3 bucket you can change it back (or add the 10k training set to the S3 bucket if you want).
I noticed you had another location for the data -- The results are for the 10k training set and the one you had is for the 5k training set.
@ledell oh, I see. The 10k one isn't in smalldata or in bigdata/laptop (that I can find)
Switched the example over to the Stacked Ensemble example from the user guide. Since I'm currently unable to build R documentation locally, I tried to follow the guidelines of the previous example as closely as I could. Please let me know if you want me to add anything back in or change what's in there.