h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.86k stars 2k forks source link

reproduce and benchmark Todd W Schneider's / Mark Litwintschik's billion taxi rides test #10043

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

This looks like a really good big data munging / query benchmark, and it's getting a lot of attention on Twitter:

https://cloud.google.com/blog/big-data/2016/05/bigquery-and-dataproc-shine-in-independent-big-data-platform-comparison

https://www.google.ru/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#newwindow=1&q=Mark+Litwintschik+taxi+dataset+h2o

exalate-issue-sync[bot] commented 1 year ago

Matt Dowle commented: Agreed. Related : http://blog.revolutionanalytics.com/2016/06/taxi2.html

h2o-ops commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-3120 Assignee: Matt Dowle Reporter: Raymond Peck State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A