Open exalate-issue-sync[bot] opened 1 year ago
J commented: Time estimate is dependent on the availability of [~accountid:5af04fccfad8eb2e0df738bb], [~accountid:557058:393936ef-8683-427b-babb-14ffad4bb6d7], [~accountid:5be8bb11b8b0222955a7f02b], [~accountid:557058:76e612a8-b669-4d2c-a22a-d4ccc3e1bf2b], and/or [~accountid:5be8bb10f9c8c708ecdb3afd] to assist.
Raymond Peck commented: Should have side-by-side examples for R and Python.
JIRA Issue Migration Info
Jira Issue: PUBDEV-562 Assignee: Joby Joy Reporter: Raymond Peck State: In Progress Fix Version: N/A Attachments: N/A Development PRs: N/A
We have the reference doc for the H2O R binding, but we regularly get questions from new users asking about which parts of R are supported, in particular regarding data munging. A 15-20 page intro doc would be really useful. Perhaps this should be a new booklet in the small yellow book series.
It should give an overview of:
how the big data is kept in the cluster and manipulated from R via references,
how to move data back and forth between data in R ,
what operations are implemented in the H2O back end,
example scripts which include simple data munging (frame manipulation via R expressions and ddply), perhaps based on the CityBike example (sans weather join) and Alex's examples.
The primary folks to work with are Sebastian, Matt, Alex, Spencer and Anqi.