Closed alvarobartt closed 1 month ago
This PR adds the Llama 3.1 405B Instruct FP8 serving example on GKE as previously done for Vertex AI as per https://github.com/huggingface/Google-Cloud-Containers/pull/87.
LGTM! Thank you
Description
This PR adds the Llama 3.1 405B Instruct FP8 serving example on GKE as previously done for Vertex AI as per https://github.com/huggingface/Google-Cloud-Containers/pull/87.