Closed arun-gupta closed 2 months ago
Would you also provide the output of kubectl describe? kubectl describe pod chatqna-tei-565488dd9-p4cj7
Without more logs, one issue in my mind is we often forget to set/modify the volumes path: You need to make sure you have created the directory /mnt/opea-models to save the cached model on the node where the ChatQnA workload is running. Otherwise, you need to modify the chatqna.yaml file to change the model-volume to a directory that exists on the node.
I shut down the cluster and will recreate it for you.
Creating a directory /mnt/opea-models
specific to a node does not seem k8s-native way. It could be a multi-node cluster and this would make it tricky. Can this be done using a PVC instead?
Yes, PVC already supported from helm-charts deploy: https://github.com/opea-project/GenAIInfra/tree/main/helm-charts#using-persistent-volume
The manifests deploy is not flexible enough and we want to provide manifests with as less as possible configuration changes.(Assume PVC would require additional setup). Maybe the best way for manifests is not to set model-volume and have the model downloaded at the container startup.(We can remove the model-volume dependency if you think this way is better)
Anything that requires customization outside of the Helm charts will add to developer friction and should be minimized.
Either way, the /mnt/opea-models
step is not documented. I'd recommend removing it but that will add to the container startup time.
This /mnt/opea-models path issue has been fixed by #745 Now by default, the tgi/tei will use a temp volume to download and save Models.
Priority
Undecided
OS type
Ubuntu
Hardware type
Xeon-SPR
Installation method
Deploy method
Running nodes
Single Node
What's the version?
latest
tag per the Helm chart at https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/kubernetes/manifests/xeon/chatqna.yaml.Description
Deploying ChatQnA on Kubernetes following the instructions at https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/kubernetes/manifests. The following pods are in
ContainerStarting
phase and not getting fully started:@mkbhanda
Reproduce steps
Here are the exact steps: https://gist.github.com/arun-gupta/fd3793baadc9feb4c3883c80b9481161
Raw log