Open alvarobartt opened 2 months ago
[!NOTE] Since the TPU containers are still WIP and not released yet on Google Cloud, the examples are on hold until those are released; meaning that most likely the current example would need to be revisited before merging.
Description
This PR adds an example on how to serve
google/gemma-7b-it
in Google Kubernetes Engine (GKE) using 4 x L4 GPUs, which better emulates a production environment, as well as explaining some concepts in order to have a production-ready deployment.