This PR adds an example for Google Cloud Run, as GPU support for NVIDIA L4 was recently included (see [announcement]()), but is still under preview.
The example showcases how to setup Cloud Run and how to deploy Meta Llama 3.1 8B AWQ (INT4) from the Hugging Face Hub via the latest Hugging Face DLC for Text Generation Inference (TGI).
Description
This PR adds an example for Google Cloud Run, as GPU support for NVIDIA L4 was recently included (see [announcement]()), but is still under preview.
The example showcases how to setup Cloud Run and how to deploy Meta Llama 3.1 8B AWQ (INT4) from the Hugging Face Hub via the latest Hugging Face DLC for Text Generation Inference (TGI).
Closes #75