opea-project / GenAIExamples

Generative AI Examples is a collection of GenAI examples such as ChatQnA, Copilot, which illustrate the pipeline capabilities of the Open Platform for Enterprise AI (OPEA) project.
https://opea.dev
Apache License 2.0
232 stars 150 forks source link

Empty/missing Kubernetes securityContexts #258

Closed eero-t closed 1 month ago

eero-t commented 3 months ago

I would expect seeing pod container securityContexts like this:

securityContext:
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: true
  seccompProfile:
    type: RuntimeDefault
  capabilities:
     drop: [ "ALL" ]

And runAsUser setting for something else than the default root [1].

However, securityContexts in this project are either not set, or empty:

$ git grep securityContext
CodeGen/kubernetes/manifests/gaudi/codegen.yaml:      securityContext: {}
CodeGen/kubernetes/manifests/gaudi/codegen.yaml:          securityContext: {}
CodeGen/kubernetes/manifests/gaudi/codegen.yaml:      securityContext: {}
CodeGen/kubernetes/manifests/gaudi/codegen.yaml:          securityContext: {}
CodeGen/kubernetes/manifests/gaudi/codegen.yaml:      securityContext: null
CodeGen/kubernetes/manifests/gaudi/codegen.yaml:          securityContext: null
CodeGen/kubernetes/manifests/xeon/codegen.yaml:      securityContext: {}
CodeGen/kubernetes/manifests/xeon/codegen.yaml:          securityContext: {}
CodeGen/kubernetes/manifests/xeon/codegen.yaml:      securityContext: {}
CodeGen/kubernetes/manifests/xeon/codegen.yaml:          securityContext: {}
CodeGen/kubernetes/manifests/xeon/codegen.yaml:      securityContext: null
CodeGen/kubernetes/manifests/xeon/codegen.yaml:          securityContext: null

For more info, see:

[1] At least for Xeon. Device access (e.g. for Gaudi) may require root user if container runtime is not properly configured: https://kubernetes.io/blog/2021/11/09/non-root-containers-and-devices/

yongfengdu commented 3 months ago

I'll look at this issue. Could you provide more information about what issue the empty securityContexts cause? I've verified this runs fine with Gaudi-device-plugin without any special priviledge settings, will learn more from your link.

eero-t commented 3 months ago

Could you provide more information about what issue the empty securityContexts cause?

Such pods cannot be run in clusters with more strict pod security policies (see the "pod-security-standards" link).

In general all unnecessary container privileges should be dropped to reduce likelihood of subverted containers being also able to take over their host. "Defense in depth" etc.

lianhao commented 3 months ago

The manifest files are generated by helmchart from GenAIInfra repo. We'll figure out minimum privileges for the workload containers running successfully, and create PR there first, both for xeon case & gaudi xeon, then later populate them here

lianhao commented 3 months ago

setting runAsUser to non-root user will results the tgi pod(image: ghcr.io/huggingface/text-generation-inference:1.4) crash, the log shows something like: RuntimeError: cannot cache function 'create_fsm_info': no locator available for file '/opt/conda/lib/python3.10/site-packages/outlines/fsm/regex.py'

Will investigate more into this

lianhao commented 3 months ago

The pod security has been added into the helm chart(see PR opea-project/GenAIInfra#133). The manifests here are generated by helm chart from GenAIInfra repo. Currently, we're in the process of discussion how to generate and use those kind of manifest with GMC and is expecting quite major changes to the k8s manifest. So we'll defer the manifest update here until that is resolved. Please see issue opea-project/GenAIInfra#129 for details tracking.

eero-t commented 3 months ago

Thanks, the merged PR looks good, but there are few things that could be improved:

lianhao commented 2 months ago

Thanks, the merged PR looks good, but there are few things that could be improved:

  • /mnt is not a good host mount point. Dirs mounted from host should be very specific (e.g. /mnt/opea-models), not top level host directories (in worst case, /mnt could e.g. include remote home directory mount points or other security sensitive data)
  • Anything that does not write to disk, should use read-only root fs setting

PR opea-project/GenAIInfra#153 should have resolved this

eero-t commented 2 months ago

PR opea-project/GenAIInfra#153 should have resolved this

Yes, looks good! Any idea when those changes get also to this (GenAIExamples) repository?

lianhao commented 2 months ago

The newly added manifests for CodeGen/ChatQnA/DocSum/CodeTrans has security context setting now

lianhao commented 1 month ago

closed as CodeGen/ChatQnA/DocSum/CodeTrans has security context setting now