opea-project / GenAIExamples

Generative AI Examples is a collection of GenAI examples such as ChatQnA, Copilot, which illustrate the pipeline capabilities of the Open Platform for Enterprise AI (OPEA) project.
https://opea.dev
Apache License 2.0
230 stars 148 forks source link

Suspicous hostIPC usage #329

Closed eero-t closed 2 months ago

eero-t commented 3 months ago

Related to #258, why services are using hostIPC option [1]:

GenAIExamples$ git grep hostIPC
ChatQnA/kubernetes/manifests/chaqna-xeon-backend-server.yaml:      hostIPC: true
ChatQnA/kubernetes/manifests/embedding.yaml:      hostIPC: true
ChatQnA/kubernetes/manifests/llm.yaml:      hostIPC: true
ChatQnA/kubernetes/manifests/reranking.yaml:      hostIPC: true
ChatQnA/kubernetes/manifests/retriever.yaml:      hostIPC: true
ChatQnA/kubernetes/manifests/tgi_gaudi_service.yaml:      hostIPC: true
ChatQnA/kubernetes/manifests/tgi_service.yaml:      hostIPC: true

Although they all use just a single replica and have no affinity rules that would make sure pods needing hostIPC interaction get scheduled to a same node:

GenAIExamples$ git grep -i affinity

GenAIExamples$ git grep replicas
ChatQnA/kubernetes/manifests/chaqna-xeon-backend-server.yaml:  replicas: 1
ChatQnA/kubernetes/manifests/embedding.yaml:  replicas: 1
ChatQnA/kubernetes/manifests/llm.yaml:  replicas: 1
ChatQnA/kubernetes/manifests/redis-vector-db.yaml:  replicas: 1
ChatQnA/kubernetes/manifests/reranking.yaml:  replicas: 1
ChatQnA/kubernetes/manifests/retriever.yaml:  replicas: 1
ChatQnA/kubernetes/manifests/tei_embedding_gaudi_service.yaml:  replicas: 1
ChatQnA/kubernetes/manifests/tei_embedding_service.yaml:  replicas: 1
ChatQnA/kubernetes/manifests/tei_reranking_service.yaml:  replicas: 1
ChatQnA/kubernetes/manifests/tgi_gaudi_service.yaml:  replicas: 1
ChatQnA/kubernetes/manifests/tgi_service.yaml:  replicas: 1
CodeGen/kubernetes/manifests/gaudi/codegen.yaml:  replicas: 1
CodeGen/kubernetes/manifests/gaudi/codegen.yaml:  replicas: 1
CodeGen/kubernetes/manifests/gaudi/codegen.yaml:  replicas: 1
CodeGen/kubernetes/manifests/xeon/codegen.yaml:  replicas: 1
CodeGen/kubernetes/manifests/xeon/codegen.yaml:  replicas: 1
CodeGen/kubernetes/manifests/xeon/codegen.yaml:  replicas: 1

?

[1] which has security implications: https://kubernetes.io/docs/concepts/security/pod-security-standards/

eero-t commented 3 months ago

On quick test after disabling hostIPC: sed -i 's/hostIPC: true/hostIPC: false/' *.yaml

ChatQnA Xeon variant performance seemed to unaffected.

eero-t commented 2 months ago

@yinghu5 Could you add some security tag to this?

It's not a security bug itself, but has security implications (does not conform to k8s security baseline, see link above).

eero-t commented 2 months ago

@yinghu5 Why you assigned this to me? I'm not a member in any of the OPEA projects.

yinghu5 commented 2 months ago

Thank you for letting me know. I thought you were the developers with good insight on the topics :)

lianhao commented 2 months ago

I don't think this bug is valid any more. We've already deleted the hostIPC settings in this v0.8 release.

eero-t commented 2 months ago

I don't think this bug is valid any more. We've already deleted the hostIPC settings in this v0.8 release.

@lianhao while hostIPC usage is gone from the manifests in "GenAIExamples", it's still used in manifest files in "GenAIInfra":

GenAIInfra$ git grep -i hostipc
manifests/ChatQnA/chaqna-xeon-backend-server.yaml:      hostIPC: true
manifests/ChatQnA/embedding.yaml:      hostIPC: true
manifests/ChatQnA/llm.yaml:      hostIPC: true
manifests/ChatQnA/reranking.yaml:      hostIPC: true
manifests/ChatQnA/retriever.yaml:      hostIPC: true
manifests/ChatQnA/tgi_gaudi_service.yaml:      hostIPC: true
manifests/ChatQnA/tgi_service.yaml:      hostIPC: true

Please remove those too.

lianhao commented 2 months ago

I'll file an issue in GenAIInfra to delete those