h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.88k stars 1.99k forks source link

Is it possible to use H2O on databricks without pysparkling? #16401

Open matt7salomon opened 6 days ago

matt7salomon commented 6 days ago

I am trying to use H2O on my databricks cluster mainly to use the 64GB Cuda GPU that I have. I dont want to convert my datasets to spark datasets though as everything else is in pandas. Is it possible to run h2o on databricks and use GPU without pysparkling? If so, what ip address do I use in my h2o.init(ip = ) . I did try this and the h2o datasets appear to be filled with all nulls.

krasinski commented 3 days ago

hello @matt7salomon, we didn't really try this would there be any benefit of using h2o on databricks without spark? are you able to use the compute without databricks? that's usually possible in cloud environments, probably also more cost effective

matt7salomon commented 3 days ago

Yes. I found out how to install it on the cluster and it works. I need to just open a terminal and download the h2o.jar and move it to the directory itself looking in