This PR adds an example on how to deploy TGI via the Hugging Face DLC for Gemma2 using multiple LoRA adapters for inference on a single NVIDIA L4 instance.
The three adapters have been fine-tuned in collaboration with @Jofthomas and can be found under the https://hf.co/google-cloud-partnership org on the Hub (still private, datasets can be moved there too):
cc @philschmid for a potential Cloud Tuesday post, @Jofthomas for his presentation on the upcoming Gemma Developer Day in Tokyo, and @pagezyhf for visibility on the example itself
This PR also includes the scripts/internal/update_example_tables.py script, which is being internally used to automatically generate the tables with the examples across the different files within this repository, to be automated on another PR.
This would temporarily make things easier to maintain, as when adding a new example one can just python scripts/internal/update_example_tables.py in the meantime to update those.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
Description
This PR adds an example on how to deploy TGI via the Hugging Face DLC for Gemma2 using multiple LoRA adapters for inference on a single NVIDIA L4 instance.
The three adapters have been fine-tuned in collaboration with @Jofthomas and can be found under the https://hf.co/google-cloud-partnership org on the Hub (still private, datasets can be moved there too):
cc @philschmid for a potential Cloud Tuesday post, @Jofthomas for his presentation on the upcoming Gemma Developer Day in Tokyo, and @pagezyhf for visibility on the example itself
And kudos to @Narsil for support on reviewing and merging https://github.com/huggingface/text-generation-inference/pull/2567, and @datavistics et al for their post at https://huggingface.co/blog/multi-lora-serving
Additionally
This PR also includes the
scripts/internal/update_example_tables.py
script, which is being internally used to automatically generate the tables with the examples across the different files within this repository, to be automated on another PR.This would temporarily make things easier to maintain, as when adding a new example one can just
python scripts/internal/update_example_tables.py
in the meantime to update those.