h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.91k stars 2k forks source link

Setup GCP to perform H2O-3 experiments #15678

Open wendycwong opened 1 year ago

wendycwong commented 1 year ago

We need to know how to run H2O-3 off GCP. This PR involves the following steps:

  1. Setup an environment off GCP that will allow us to run H2O-3;
  2. Upload the following datasets into GCP data storage: a. https://s3.amazonaws.com/h2o-public-test-data/HAICTest/GLM_10EnumC_10NumC_multinomial_6Classes_32GB.csv b. https://s3.amazonaws.com/h2o-public-test-data/HAICTest/GLM_10EnumC_10NumC_multinomial_6Classes_65GB.csv c. https://s3.amazonaws.com/h2o-public-test-data/HAICTest/GLM_10EnumC_10NumC_multinomial_6Classes_98GB.csv d. https://s3.amazonaws.com/h2o-public-test-data/HAICTest/GLM_10EnumC_10NumC_multinomial_6Classes_130GB.csv

The experiments needed to be performed are captured in this issue: https://github.com/h2oai/private-h2o-3/issues/2

In particular, we want to figure out the metrics for memory and cpu sizes as well to guide future users on how to correctly sizing their clusters in terms of memory and cpu.

wendycwong commented 1 year ago

Just got info from @hasithjp and I am copying them down over here:

These are some options to copy datasets from s3. https://cloud.google.com/blog/topics/developers-practitioners/transfer-data-aws-gcp-using-storage-transfer-service https://stackoverflow.com/questions/21437769/copying-directly-from-s3-to-google-cloud-storage or you can directly use aws cli commands on your laptop to pull data from s3

We can make a new project on GCP from our end for you to work on

new project: https://console.cloud.google.com/home/dashboard?project=eng-wendy-h2o-3-gcp-test accounts.google.comaccounts.google.com Google Cloud Platform Google Cloud Platform lets you build, deploy, and scale applications, websites, and services on the same infrastructure as Google.

Please follow this guide when working in GCP https://h2oai.atlassian.net/wiki/spaces/DEVOPS/pages/3622600716/H2O.AI+GCP+Account+Handling+User+Guide H2O.AI GCP Account Handling User Guide in DevOps Last updated 12 days ago by Hasith Perera More actions... Added by Confluence Cloud

https://h2oai.slack.com/archives/CT1BQCRV5/p1690469507461799

Hasith Perera :rotating_light:@here GCP Rules and Cost Saving Automations Hello Team, Please note that we are going to impose the following rules and cost saving automations on the GCP account. Please check the following documents for more explanations. H2O.AI GCP Account Handling User Guide GCP Resources Labelling GCP Cost Savings/Cleanup Automations Important Rules Make sure to request new dedicated Projects from the DevOps team based on the project/owner/purpose since it will help us to analyze the cost instead of using a common project for all Add the mandatory resource labels following the GCP Resources Labelling guide. Please add the tags carefully since label enforcement is not supported by GCP atm as in AWS. Hence you won’t get blocked Enabled regions will be limited as listed in the guide Automations Stop standalone VM instances after hours based on the given schedule label Terminate stopped VM instances older than 90 days Auto stop standalone VM instances with scheduling=self-managed tag after 24 Scale down Instances Groups/GKE node pools to 0 during the weekend Delete unused disks older than 90 days Note: Resources filtering will be done based on the resource labelling for automations. Hence please make sure to put the correct labels. Check GCP Cost Savings/Cleanup Automations for more details. Please reach out to us if you have concerns regarding these as we are planning to roll out these changes by the end of next week. We will be adding more rules/automations as required in the future as well and We will keep you informed on them. Thanks for the support!

wendycwong commented 1 year ago

The h2oai-installer-gcp-h2o3test.zip is too big to be attached here.

wendycwong commented 1 year ago

Problem with big query?