huggingface / Google-Cloud-Containers

Hugging Face Deep Learning Containers (DLCs) for Google Cloud
https://hf.co/docs/google-cloud
Apache License 2.0
130 stars 18 forks source link

Add `examples/gke/tgi-multi-lora-deployment` #102

Closed alvarobartt closed 1 month ago

alvarobartt commented 1 month ago

Description

This PR adds an example on how to deploy TGI via the Hugging Face DLC for Gemma2 using multiple LoRA adapters for inference on a single NVIDIA L4 instance.

The three adapters have been fine-tuned in collaboration with @Jofthomas and can be found under the https://hf.co/google-cloud-partnership org on the Hub (still private, datasets can be moved there too):

cc @philschmid for a potential Cloud Tuesday post, @Jofthomas for his presentation on the upcoming Gemma Developer Day in Tokyo, and @pagezyhf for visibility on the example itself

And kudos to @Narsil for support on reviewing and merging https://github.com/huggingface/text-generation-inference/pull/2567, and @datavistics et al for their post at https://huggingface.co/blog/multi-lora-serving

Additionally

This PR also includes the scripts/internal/update_example_tables.py script, which is being internally used to automatically generate the tables with the examples across the different files within this repository, to be automated on another PR.

This would temporarily make things easier to maintain, as when adding a new example one can just python scripts/internal/update_example_tables.py in the meantime to update those.

HuggingFaceDocBuilderDev commented 1 month ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.