huggingface / Google-Cloud-Containers

Including Hugging Face Deep learning Containers for Google Cloud
Apache License 2.0
112 stars 10 forks source link

Add `examples/gke/tgi-multi-gpu-deployment/` for multi-GPU TGI #58

Open alvarobartt opened 2 months ago

alvarobartt commented 2 months ago

Description

This PR adds an example on how to serve google/gemma-7b-it in Google Kubernetes Engine (GKE) using 4 x L4 GPUs, which better emulates a production environment, as well as explaining some concepts in order to have a production-ready deployment.

alvarobartt commented 2 days ago

[!NOTE] Since the TPU containers are still WIP and not released yet on Google Cloud, the examples are on hold until those are released; meaning that most likely the current example would need to be revisited before merging.