Sometimes (always, as far as I can tell in my env) we will see that chatqna-llm can't connect to chatqna-tgi which prevents it from becoming ready. Everything looks fine otherwise, in both llm and tgi services.
/usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:149: UserWarning: Field "model_name_or_path" has conflict with protected namespace "model_".
You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
warnings.warn(
[2024-06-24 19:50:36,324] [ INFO] - CORS is enabled.
[2024-06-24 19:50:36,325] [ INFO] - Setting up HTTP server
[2024-06-24 19:50:36,325] [ INFO] - Uvicorn server setup on port 9000
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:9000 (Press CTRL+C to quit)
[2024-06-24 19:50:36,338] [ INFO] - HTTP server setup successful
but not this:
kubectl describe pod chatqna-llm-uservice-589477686b-lhtdb:
Warning Unhealthy 11m (x2 over 11m) kubelet Startup probe failed: % Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
curl: (7) Failed to connect to chatqna-tgi port 80 after 4 ms: Couldn't connect to server
Warning Unhealthy 3m3s (x62 over 8m8s) kubelet Startup probe failed: command "curl http://chatqna-tgi" timed out
Furthermore, in this state, we can't actually talk to ChatQnA, simply getting Internal Server Error if anything is sent to :8888/v1/chatqna.
The way I've found to fix it is simply to go into the llm service and manually curl the chatqna-tgi service, which seems to somehow unblock the networking path to TGI:
Sometimes (always, as far as I can tell in my env) we will see that chatqna-llm can't connect to chatqna-tgi which prevents it from becoming ready. Everything looks fine otherwise, in both llm and tgi services.
kubectl get pods
:When we check the logs of
chatqna-llm-uservice
, everything looks fine:kubectl logs chatqna-llm-uservice-589477686b-lhtdb
:but not this:
kubectl describe pod chatqna-llm-uservice-589477686b-lhtdb
:Furthermore, in this state, we can't actually talk to ChatQnA, simply getting
Internal Server Error
if anything is sent to:8888/v1/chatqna
.The way I've found to fix it is simply to go into the llm service and manually curl the chatqna-tgi service, which seems to somehow unblock the networking path to TGI:
And now everything works including talking to ChatQnA.
kubectl get pods
: