huggingface / optimum-tpu

Google TPU optimizations for transformers models
Apache License 2.0
40 stars 8 forks source link

Text-generation-inference (TPU) container fixes #65

Open Michellehbn opened 1 week ago

Michellehbn commented 1 week ago

As part of the support of TPU in Inference Endpoints and for a better user experience:

cc @tengomucho @mfuntowicz

tengomucho commented 6 days ago

This can be separated in several smaller tasks. I'll list them here to follow up progress.

I have now fixed the health issue. The problem was a wrong CachedBatch serialization. progrss is in branch debug-tgi-ie.

tengomucho commented 3 days ago

Daily update : warmup now works on the branch and the truncate works too. I am currently working on increasing the input length, trying to do that by bucketing prefilled inputs.

tengomucho commented 1 day ago

I have almost fixed everything, I do truncate as it should and I do bucketing and warmup. But I also introduced a bug, because I padded wrongly when bucketing prefills. I will fix that tomorrow.