huggingface / optimum-tpu

Google TPU optimizations for transformers models
Apache License 2.0
75 stars 19 forks source link

Jetstream by default #118

Closed tengomucho closed 2 days ago

tengomucho commented 1 week ago

What does this PR do?

This makes all the changes to allow having the Jetstream Pytorch engine to be the default backend for TGI on TPUs. This backend is reliable and performant and give the best throughput on TGI.

HuggingFaceDocBuilderDev commented 1 week ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

tengomucho commented 4 days ago

I think I got lost in your changes: can you summarize how tests are now supposed to work ?

@dacorvo as discussed offline, the idea is to change the default backend of TPU TGI from torch xla to jetstream. I just updated the tests so they use clearer markers to check if the backend they are running is correctly selected.