Closed aviv12825 closed 1 month ago
Hi @aviv12825,
I see the errors returned involve "connection refused". Have you confirmed from the pod logs that the server itself started up successfully to expose these endpoints?
Closing due to lack of activity. Please re-open the issue if you would like to follow up with this issue.
On server/deploy/oci - running "helm install example ." to deploy the Inference Server and pod doesn't get to running due to Liveness probe failed & Readiness probe failed.
Below describe log details & I try to add to templates\deployment.yaml file the initialDelaySeconds: 180 which didn't help. Can someone please advise ?
Events: Type Reason Age From Message
Normal Scheduled 4m11s default-scheduler Successfully assigned default/example-triton-inference-server-9c5d9f79-74rt4 to 10.0.10.95 Warning Unhealthy 41s (x3 over 61s) kubelet Liveness probe failed: Get "http://10.0.10.177:8000/v2/health/live": dial tcp 10.0.10.177:8000: connect: connection refused Normal Killing 41s kubelet Container triton-inference-server failed liveness probe, will be restarted Normal Pulled 11s (x2 over 4m10s) kubelet Container image "nvcr.io/nvidia/tritonserver:24.03-py3" already present on machine Warning Unhealthy 11s (x13 over 66s) kubelet Readiness probe failed: Get "http://10.0.10.177:8000/v2/health/ready": dial tcp 10.0.10.177:8000: connect: connection refused Normal Created 10s (x2 over 4m10s) kubelet Created container triton-inference-server Normal Started 10s (x2 over 4m10s) kubelet Started container triton-inference-server