rapidsai / cloud-ml-examples

A collection of Machine Learning examples to get started with deploying RAPIDS in the Cloud
Apache License 2.0
138 stars 70 forks source link

 RAPIDS Cloud Machine Learning Services Integration

Please see https://docs.rapids.ai/deployment/stable/examples/ for up to date examples

RAPIDS is a suite of open-source libraries that bring GPU acceleration to data science pipelines. Users building cloud-based machine learning experiments can take advantage of this acceleration throughout their workloads to build models faster, cheaper, and more easily on the cloud platform of their choice.

This repository provides example notebooks and "getting started" code samples to help you integrate RAPIDS with the hyperparameter optimization services from Azure ML, AWS Sagemaker, Google Cloud, and Databricks. The directory for each cloud contains a step-by-step guide to launch an example hyperparameter optimization job. Each example job will use RAPIDS cuDF to load and preprocess data and use cuML or XGBoost for GPU-accelerated model training. RAPIDS also integrates easily with MLflow to track and orchestrate experiments from any of these frameworks.

For large datasets, you can find example notebooks using Dask to load data and train models on multiple GPUs in the same instance or in a multi-node multi-GPU cluster.

Notebooks with a ✅ are fully functional as of RAPIDS Release 22.08, Notebooks with a ❌ require an update, or replacement.

Cloud / Framework HPO Example Multi-node multi-GPU Example
Microsoft Azure Azure ML HPO ❌ Multi-node multi-GPU cuML on Azure ❌
Amazon Web Services (AWS) AWS SageMaker HPO ✅
Scaling up hyperparameter optimization with Kubernetes and XGBoost GPU algorithm ✅
Google Cloud Platform (GCP) Google AI Platform HPO ❌
Scaling up hyperparameter optimization with Kubernetes and XGBoost GPU algorithm ✅
Multi-node multi-GPU XGBoost and cuML on Google Kubernetes Engine (GKE) ✅
Dask Dask-ML HPO ✅ Multi-node multi-GPU XGBoost and cuML ✅
Databricks Hyperopt and MLflow on Databricks ✅
MLflow Hyperopt and MLflow on GKE ✅
Optuna Dask-Optuna HPO ✅
Optuna on Azure ML ❌
Ray Tune Ray Tune HPO ❌

Quick Start Using RAPIDS Cloud ML Container

The Cloud ML Docker Repository provides a ready to run Docker container with RAPIDS and libraries/SDKs for AWS SageMaker, Azure ML and Google AI Platfrom HPO examples.

Pull Docker Image:

docker pull rapidsai/rapidsai-cloud-ml:22.10-cuda11.5-base-ubuntu20.04-py3.9

Build Docker Image:

From the root cloud-ml-examples directory:

docker build --tag rapidsai-cloud-ml:latest --file ./common/docker/Dockerfile.training.unified ./

Bring Your Own Cloud (Dask and Ray)

In addition to public cloud HPO options, the respository also includes "BYOC" sample notebooks that can be run on the public cloud or private infrastructure of your choice, these leverage Ray Tune or Dask-ML for distributed infrastructure.

Check out the RAPIDS HPO webpage for video tutorials and blog posts.

Logo