h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.86k stars 1.99k forks source link

Create a CBIND function that allow users to combine arbitrary columns from two frames and form a new frame. #15265

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

I was trying to perform the functions found in h2o-3/h2o-r/tests/testdir_algos/deeplearning/runit_deeplearning_autoencoder_large.R.

I downloaded a data frame from bigdata/laptop/mnist/train.csv.gz and split the frame into two parts. There are 785 columns in the data frames. The last column is the response column. I used deep learning (with auto encoder) to convert the 784 columns into 20 columns of deep features extracted using deep learning using the first part of the split data frame.

Next, I would like to use the model to predict 20 columns of deep features for part 2 of the data frame with the response column to form a new frame where I would like to run various algorithm to perform classification.

However, I was not able to do that. This is equivalent to a CBIND function in other languages. Prithvi has agreed to implement this new feature for us.

exalate-issue-sync[bot] commented 1 year ago

Erin LeDell commented: We already have h2o.cbind (in R), so hopefully it's as easy as exposing that functionality in Flow.

DinukaH2O commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-2358 Assignee: Prithvi Prabhu Reporter: Wendy State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A