h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.91k stars 2k forks source link

create "intro to using H2O from R and Python with data munging" doc #13542

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

We have the reference doc for the H2O R binding, but we regularly get questions from new users asking about which parts of R are supported, in particular regarding data munging. A 15-20 page intro doc would be really useful. Perhaps this should be a new booklet in the small yellow book series.

It should give an overview of:

  1. how the big data is kept in the cluster and manipulated from R via references,

  2. how to move data back and forth between data in R ,

  3. what operations are implemented in the H2O back end,

  4. example scripts which include simple data munging (frame manipulation via R expressions and ddply), perhaps based on the CityBike example (sans weather join) and Alex's examples.

The primary folks to work with are Sebastian, Matt, Alex, Spencer and Anqi.

exalate-issue-sync[bot] commented 1 year ago

J commented: Time estimate is dependent on the availability of [~accountid:5af04fccfad8eb2e0df738bb], [~accountid:557058:393936ef-8683-427b-babb-14ffad4bb6d7], [~accountid:5be8bb11b8b0222955a7f02b], [~accountid:557058:76e612a8-b669-4d2c-a22a-d4ccc3e1bf2b], and/or [~accountid:5be8bb10f9c8c708ecdb3afd] to assist.

exalate-issue-sync[bot] commented 1 year ago

Raymond Peck commented: Should have side-by-side examples for R and Python.

DinukaH2O commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-562 Assignee: Joby Joy Reporter: Raymond Peck State: In Progress Fix Version: N/A Attachments: N/A Development PRs: N/A