issues
search
huggingface
/
optimum-tpu
Google TPU optimizations for transformers models
Apache License 2.0
75
stars
19
forks
source link
Small TGI enhancements
#97
Closed
tengomucho
closed
1 month ago
tengomucho
commented
1 month ago
What does this PR do?
Handle
max_input_tokens
sever argument, that can reduce the cache size in Jetstream.
Add
SKIP_WARMUP
parameter to the legacy Pytorch/XLA TGI to simply debug.
What does this PR do?
max_input_tokens
sever argument, that can reduce the cache size in Jetstream.SKIP_WARMUP
parameter to the legacy Pytorch/XLA TGI to simply debug.