h2oai / h2o-tutorials

Tutorials and training material for the H2O Machine Learning Platform
http://h2o.ai
1.48k stars 1.01k forks source link

PUBDEV-4343: Documentation: Update the stacking online tutorial #125

Closed hannah-tillman closed 4 years ago

hannah-tillman commented 4 years ago

Switched the example over to the Stacked Ensemble example from the user guide. Since I'm currently unable to build R documentation locally, I tried to follow the guidelines of the previous example as closely as I could. Please let me know if you want me to add anything back in or change what's in there.

ledell commented 4 years ago

@angela0xdata Can you remind me where the text for the stacking tutorial page is pulled from? It looks like the README.md?
@hannah-tillman It looks like the code and some text in the README.md file is what needs to be changed, not the R script.

If this is going to be referenced widely, then it might make sense to just remove all the old h2oEnsmeble stuff from this folder -- we should find a place for it, maybe in the ./h2o-3/h2o-r/ensemble folder in the h2o-3 repo for now? (I should really move that stuff out of there and into it's own repo since it's very old at this point -- there's a JIRA for this).

ABartzGit commented 4 years ago

@ledell oh, you're right! The file is here: https://github.com/h2oai/h2o-tutorials/blob/master/tutorials/ensembles-stacking/README.md

hannah-tillman commented 4 years ago

It looks like you deleted a lot from the readme. I think (unless I'm wrong), the only real changes that needed to be done are:

  • to make this work with ensemble <- h2o.stackedEnsemble instead of fit <- h2o.ensemble
  • removed text about this being implemented as a separate package
  • change the path for loading the test and train datasets to point to the smalldata folder instead of erin's folder

See the Stacked Ensemble chapter in the user guide for more info.

From what I understood from messaging @ledell was that I was to remove mention of the old package (minus some of the background information), rewrite to not talk about the wrapper functions and stuff specific to the old implementation (which was quite a bit of the old example), and switch the code example over to the one in the User Guide. Switching the example required a lot of rewriting at the bottom (especially with the explanations). I did try to add back in some of the explanations from the previous example. If there's more I should add back in, please let me know!

ABartzGit commented 4 years ago

@hannah-tillman ahh, I guess I either wasn't aware or I have just forgotten about that. I'll let @ledell weigh in 😬

ledell commented 4 years ago

@hannah-tillman Can you update the Stacked Ensemble user guide code to use the code snippet above? There's a few edits... Thank you!

ledell commented 4 years ago

One thing I changed (which you might want to change again) is the path for the two datasets. I noticed you had another location for the data -- The results are for the 10k training set and the one you had is for the 5k training set. If you can locate a 10k version of this data in that S3 bucket you can change it back (or add the 10k training set to the S3 bucket if you want).

ABartzGit commented 4 years ago

I noticed you had another location for the data -- The results are for the 10k training set and the one you had is for the 5k training set.

@ledell oh, I see. The 10k one isn't in smalldata or in bigdata/laptop (that I can find)