h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.84k stars 2k forks source link

CSV Parser POC #12327

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

There is a new CSV parser POC ready in seprate branch: https://github.com/h2oai/h2o-3/tree/pavel_csv_poc

This parser was inspired by https://github.com/osiegmar/FastCSV.

Parts of the original code are used, even thogh modified. FastCSV is not used as a dependency. We must definitely mention their code has inspired us, because the original code is licensed under Apache 2 License.

New parser memory consumption: !Snímek z 2018-04-08 19-35-57.png|thumbnail! Original parser: !Snímek z 2018-04-08 19-34-29.png|thumbnail!

Cons:

Can parse higgs_10, poets, TED, toxic_comments and other standard keggle datasets our current parser can not.

Stil a very early alpha version.

hasithjp commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-5461 Assignee: New H2O Bugs Reporter: Pavel Pscheidl State: Open Fix Version: N/A Attachments: Available (Count: 2) Development PRs: N/A

Attachments From Jira

Attachment Name: Snímek z 2018-04-08 19-34-29.png Attached By: Pavel Pscheidl File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-5461/Snímek z 2018-04-08 19-34-29.png

Attachment Name: Snímek z 2018-04-08 19-35-57.png Attached By: Pavel Pscheidl File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-5461/Snímek z 2018-04-08 19-35-57.png

hasithjp commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-5461 Assignee: New H2O Bugs Reporter: Pavel Pscheidl State: Open Fix Version: N/A Attachments: Available (Count: 2) Development PRs: N/A

Attachments From Jira

Attachment Name: Snímek z 2018-04-08 19-34-29.png Attached By: Pavel Pscheidl File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-5461/Snímek z 2018-04-08 19-34-29.png

Attachment Name: Snímek z 2018-04-08 19-35-57.png Attached By: Pavel Pscheidl File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-5461/Snímek z 2018-04-08 19-35-57.png