huggingface / Google-Cloud-Containers

Hugging Face Deep Learning Containers (DLCs) for Google Cloud
https://hf.co/docs/google-cloud
Apache License 2.0
131 stars 18 forks source link

Add `examples/gke/tgi-llama-405b-deployment` #103

Closed alvarobartt closed 1 month ago

alvarobartt commented 1 month ago

Description

This PR adds the Llama 3.1 405B Instruct FP8 serving example on GKE as previously done for Vertex AI as per https://github.com/huggingface/Google-Cloud-Containers/pull/87.

pagezyhf commented 1 month ago

LGTM! Thank you