h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.91k stars 2k forks source link

String munging: matches #9494

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

Restore functionality of matches for R client. Further, expand the behavior to work on String columns as well.

The Java implementation is: class ASTMatch in h2o-core/src/main/java/water/rapids/ASTUniOp.java

The R client code is at: h2o.match <- function(x, table, nomatch = 0, incomparables = NULL) in h2o-r/h2o-package/R/frame.R

The original R unit test is h2o-r/tests/testdir_munging/slice/runit_NOPASS_match.R

The python client implementation is def match(self, table, nomatch=0) in h2o-py/h2o/frame.py

I believe a python test is still needed. For a look at other string related methods in Java see: h2o-core/src/main/java/water/ASTStrOp.java and then h2o-core/src/main/java/water/fvec/CStrChunk.java for accelerated versions of methods when the string column is pure ASCII.

h2o-ops commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-2553 Assignee: New H2O Bugs Reporter: Brandon Hill State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A