huggingface / optimum-tpu

Google TPU optimizations for transformers models

Apache License 2.0

75 stars 19 forks source link

Closed tengomucho closed 1 month ago

tengomucho commented 1 month ago

What does this PR do?

Handle max_input_tokens sever argument, that can reduce the cache size in Jetstream.
Add SKIP_WARMUP parameter to the legacy Pytorch/XLA TGI to simply debug.