h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.85k stars 2k forks source link

Enable H2O-3 to load data from Snowflake using JDBC #16327

Closed wendycwong closed 2 weeks ago

wendycwong commented 1 month ago

@racoss suggested that H2O-3 should be able to import data from Snowflake via JDBC.

We have a paid customer who is looking to do this.

Here is some reference on this from Rafael:

https://docs.h2o.ai/h2o/latest-stable/h2o-docs/getting-data-into-h2o.html?highlight=jdbc#jdbc-databases

wendycwong commented 1 month ago

We do have an example in the reference on how to use JDBC to connect to various end points.

@racoss suggests that we should try it out with snowflake;

If that works, we will talk to AI Engine Manager to start H2O-3 with an option to point to some JDBC driver.

wendycwong commented 1 month ago

According to @racoss , looks like you can download the SF JDBC driver here: https://docs.snowflake.com/en/developer-guide/jdbc/jdbc-download

NEW

7:03 https://docs.snowflake.com/en/developer-guide/jdbc/jdbc-configure 7:05 https://docs.snowflake.com/en/developer-guide/jdbc/jdbc-using#java-sample-program

I was able to download the JDBC driver from there. Thanks, @racoss

wendycwong commented 1 month ago

I started H2O-3 using the following command:

java -cp build/h2o.jar:/Users/wendycwong/Downloads/snowflake-jdbc-3.9.2.jar water.H2OApp

Then, I run the following code:

connection_string = "jdbc:snowflake://h20_ai_partner.snowflakecomputing.com/?warehouse=DEMO_WH&db=GLM_GAUSSIAN&schema=PUBLIC&application=H2OWater&role=AccountAdmin"
name = "michelle"
pc = "BlahBlah"
command_list = ['connection_url', 'table', 'username', 'password', 'columns', 'optimize']
train = h2o.import_sql_table(connection_string, table="GLM_GAUSSIAN_20COLS_10000ROWS", username=name, password=pc)
print(train)

and it works!

@tomasfryda and @krasinski told me my original connection_string is wrong. rasika.govinnage figured out the correct connection string to use. It is:

jdbc:snowflake://h20_ai_partner.snowflakecomputing.com/?warehouse=DEMO_WH&db=GLM_GAUSSIAN&schema=PUBLIC&application=H2OWater&role=AccountAdmin

Thank you guys!

wendycwong commented 1 month ago

image

wendycwong commented 2 weeks ago

@bilcus has has incorporated Joe-g's suggestion and it seems that works. I was able to go here: https://cloud-dev.h2o.ai/aiengines, start a H2O-3 engine. Choose importsqlTable and set the correct fields:

image