Closed Edwinhr716 closed 4 months ago
Hi here @Edwinhr716, AFAIK this should have been fixed as of https://github.com/huggingface/optimum-tpu/pull/66, so any of:
should work if you rebuild the containers, thanks for flagging! Also see the original issue where this was listed at https://github.com/huggingface/optimum-tpu/issues/65#issuecomment-2196871340, and kudos to @tengomucho for solving those 👏🏻
Hi @Edwinhr716, @alvarobartt is right we have put some effort in improving TGI robustness with optimum-tpu. Latest release should be the most solid one, let us know if you still see the issue.
Upgrading it worked! Thanks for the help, I'll close the issue
I'm planning on using the endpoint
/health
for liveness and readiness probes for my kubernetes deployments, but I've been running into issues.This is the deployment that I'm testing
However, I get this error related to the
/health
endpointI also tested out if I could reach the endpoint using curl. When I do a
/generate
request first, it returns successfully:However, if I don't do a
/generate
request beforehand, the/health
request never returns.Looking at the router code, looks like this path is not working properly on Optimum TPU https://github.com/huggingface/text-generation-inference/blob/main/router/src/infer/health.rs#L27