Open Michellehbn opened 1 week ago
This can be separated in several smaller tasks. I'll list them here to follow up progress.
I have now fixed the health issue. The problem was a wrong CachedBatch serialization. progrss is in branch debug-tgi-ie.
Daily update : warmup now works on the branch and the truncate works too. I am currently working on increasing the input length, trying to do that by bucketing prefilled inputs.
I have almost fixed everything, I do truncate as it should and I do bucketing and warmup. But I also introduced a bug, because I padded wrongly when bucketing prefills. I will fix that tomorrow.
As part of the support of TPU in Inference Endpoints and for a better user experience:
cc @tengomucho @mfuntowicz