siderolabs / talos

Talos Linux is a modern Linux distribution built for Kubernetes.
https://www.talos.dev
Mozilla Public License 2.0
6.93k stars 556 forks source link

Mayastor fails to install #8001

Closed rochecompaan closed 12 months ago

rochecompaan commented 12 months ago

Bug report

Mayastor fails to install, presumably because it violates the PodSecurity policy.

Description

  1. I applied the machine patches per the documentation in https://www.talos.dev/v1.5/kubernetes-guides/configuration/storage/#prep-nodes and restarted the kubelet on each node. I verified that the label was applied to each node and the hugepages was updated to 2048 as per this comment
  2. I installed mayastor with:
    helm install mayastor mayastor/mayastor -n mayastor --create-namespace --version 2.4.0

Logs

Response after helm install

W1129 10:12:28.575086 2438627 warnings.go:70] would violate PodSecurity "restricted:latest": restricted volume types (volumes "run", "containers", "pods" use restricted volume type "hostPath"), runAsNonRoot != true (pod or container "promtail" must set securityContext.runAsNonRoot=true), runAsUser=0 (pod must not set runAsUser=0), seccompProfile (pod or container "promtail" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
W1129 10:12:28.585137 2438627 warnings.go:70] would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true), privileged (container "agent-ha-node" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (containers "agent-cluster-grpc-probe", "agent-ha-node" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers "agent-cluster-grpc-probe", "agent-ha-node" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volumes "device", "sys", "run-udev", "plugin-dir" use restricted volume type "hostPath"), runAsNonRoot != true (pod or containers "agent-cluster-grpc-probe", "agent-ha-node" must set securityContext.runAsNonRoot=true), seccompProfile (pod or containers "agent-cluster-grpc-probe", "agent-ha-node" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
W1129 10:12:28.585159 2438627 warnings.go:70] would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true), privileged (container "csi-node" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (containers "csi-node", "csi-driver-registrar" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers "csi-node", "csi-driver-registrar" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volumes "device", "sys", "run-udev", "registration-dir", "plugin-dir", "kubelet-dir" use restricted volume type "hostPath"), runAsNonRoot != true (pod or containers "csi-node", "csi-driver-registrar" must set securityContext.runAsNonRoot=true), seccompProfile (pod or containers "csi-node", "csi-driver-registrar" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
W1129 10:12:28.586331 2438627 warnings.go:70] would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true), privileged (container "io-engine" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (containers "agent-core-grpc-probe", "etcd-probe", "metrics-exporter-pool", "io-engine" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers "agent-core-grpc-probe", "etcd-probe", "metrics-exporter-pool", "io-engine" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volumes "device", "udev", "configlocation" use restricted volume type "hostPath"), runAsNonRoot != true (pod or containers "agent-core-grpc-probe", "etcd-probe", "metrics-exporter-pool", "io-engine" must set securityContext.runAsNonRoot=true), seccompProfile (pod or containers "agent-core-grpc-probe", "etcd-probe", "metrics-exporter-pool", "io-engine" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
W1129 10:12:28.796463 2438627 warnings.go:70] would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (containers "agent-core-grpc-probe", "etcd-probe", "api-rest" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers "agent-core-grpc-probe", "etcd-probe", "api-rest" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or containers "agent-core-grpc-probe", "etcd-probe", "api-rest" must set securityContext.runAsNonRoot=true), seccompProfile (pod or containers "agent-core-grpc-probe", "etcd-probe", "api-rest" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
W1129 10:12:28.796475 2438627 warnings.go:70] would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (containers "agent-core-grpc-probe", "etcd-probe", "operator-diskpool" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers "agent-core-grpc-probe", "etcd-probe", "operator-diskpool" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or containers "agent-core-grpc-probe", "etcd-probe", "operator-diskpool" must set securityContext.runAsNonRoot=true), seccompProfile (pod or containers "agent-core-grpc-probe", "etcd-probe", "operator-diskpool" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
W1129 10:12:28.796497 2438627 warnings.go:70] would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (containers "obs-callhome", "obs-callhome-stats" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers "obs-callhome", "obs-callhome-stats" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or containers "obs-callhome", "obs-callhome-stats" must set securityContext.runAsNonRoot=true), seccompProfile (pod or containers "obs-callhome", "obs-callhome-stats" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
W1129 10:12:28.798525 2438627 warnings.go:70] would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "mayastor-localpv-provisioner" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "mayastor-localpv-provisioner" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "mayastor-localpv-provisioner" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "mayastor-localpv-provisioner" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
W1129 10:12:28.802180 2438627 warnings.go:70] would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true), allowPrivilegeEscalation != false (containers "api-rest-probe", "csi-provisioner", "csi-attacher", "csi-snapshotter", "csi-snapshot-controller", "csi-controller" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers "api-rest-probe", "csi-provisioner", "csi-attacher", "csi-snapshotter", "csi-snapshot-controller", "csi-controller" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or containers "api-rest-probe", "csi-provisioner", "csi-attacher", "csi-snapshotter", "csi-snapshot-controller", "csi-controller" must set securityContext.runAsNonRoot=true), seccompProfile (pod or containers "api-rest-probe", "csi-provisioner", "csi-attacher", "csi-snapshotter", "csi-snapshot-controller", "csi-controller" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
W1129 10:12:28.810152 2438627 warnings.go:70] would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (containers "etcd-probe", "agent-core", "agent-ha-cluster" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers "etcd-probe", "agent-core", "agent-ha-cluster" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or containers "etcd-probe", "agent-core", "agent-ha-cluster" must set securityContext.runAsNonRoot=true), seccompProfile (pod or containers "etcd-probe", "agent-core", "agent-ha-cluster" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
W1129 10:12:29.019329 2438627 warnings.go:70] would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (containers "volume-permissions", "loki" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers "volume-permissions", "loki" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod must not set securityContext.runAsNonRoot=false), runAsUser=0 (container "volume-permissions" must not set runAsUser=0), seccompProfile (pod or containers "volume-permissions", "loki" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
W1129 10:12:29.019459 2438627 warnings.go:70] would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (containers "nats", "reloader", "metrics" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers "nats", "reloader", "metrics" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or containers "nats", "reloader", "metrics" must set securityContext.runAsNonRoot=true), seccompProfile (pod or containers "nats", "reloader", "metrics" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
W1129 10:12:29.019891 2438627 warnings.go:70] would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "volume-permissions" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers "volume-permissions", "etcd" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "volume-permissions" must set securityContext.runAsNonRoot=true), runAsUser=0 (container "volume-permissions" must not set runAsUser=0), seccompProfile (pod or containers "volume-permissions", "etcd" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
NAME: mayastor
LAST DEPLOYED: Wed Nov 29 10:12:11 2023
NAMESPACE: mayastor
STATUS: deployed
REVISION: 1

Pods status after installation:

❮ kubectl -n mayastor get pods
NAME                                            READY   STATUS     RESTARTS   AGE
mayastor-agent-core-85499cf6db-jxpd9            0/2     Init:0/1   0          3m8s
mayastor-api-rest-646c479b4b-tn29v              0/1     Init:0/2   0          3m8s
mayastor-etcd-0                                 0/1     Pending    0          3m8s
mayastor-etcd-1                                 0/1     Pending    0          3m8s
mayastor-etcd-2                                 0/1     Pending    0          3m8s
mayastor-localpv-provisioner-85c8774849-82rlv   1/1     Running    0          3m8s
mayastor-loki-0                                 0/1     Pending    0          3m8s
mayastor-nats-0                                 3/3     Running    0          3m8s
mayastor-nats-1                                 3/3     Running    0          3m8s
mayastor-nats-2                                 3/3     Running    0          3m8s
mayastor-obs-callhome-59b44bbff6-pzt6s          2/2     Running    0          3m8s
mayastor-operator-diskpool-57cbdc854c-kkn2h     0/1     Init:0/2   0          3m8s

Environment


- Kubernetes version: v1.28.2
- Platform: Hetzner Cloud cx21 VM
rochecompaan commented 12 months ago

It looks like the Mayastor operator is trying to provision a disk pool using the openebs dynamic-localpv-provisioner/ as per this comment. I imagine all the node prep required for OpenEBS volumes would be required before you can install Mayastor in a Talos cluster.

smira commented 12 months ago

Talos by default follows Kubernetes best security practices, so it enables Pod Security. The Mayastor deployment doesn't mark itself with a proper policy, so this is not a bug in Talos, but rather might be seen as a bug in the Helm chart.

Long story short, you can label the namespace with the proper label as Mayastor is definitely privileged:

$ kubectl label ns mayastor pod-security.kubernetes.io/enforce=privileged
rochecompaan commented 12 months ago

This is not a Talos bug per sé, but the Talos documentation does not have sufficient node prep instructions for the current release of Mayastor working. As per the comment I linked to, there is a hard dependency on the OpenEBS localpv provisioner, and this requires additional node prep. It would be helpful to add this to the docs.

smira commented 12 months ago

@rochecompaan please submit a PR with fixes, documentation always lags behind!

rochecompaan commented 12 months ago

@smira I'm more than happy to. I'll confirm the exact requirements locally and update the docs accordingly.

Mohitsharma44 commented 11 months ago

@rochecompaan, did you get a chance to test out mayastor's Dynamic Local Persistent Volume (LocalPV) provisioner?

rochecompaan commented 11 months ago

I ran into more issues when I continued testing and ran out of time. I mean to return to it once I have time again. Once I managed to install the openebs dependency I realized that openebs has a more than adequate local storage solution, so I wasn't pressed to make Mayastor work anymore. Unfortunately, I can't remember what issues I ran into, they might not even be Mayastor specific.

Mohitsharma44 commented 11 months ago

Fair enough. I had sometime this evening so tried to make Mayastor work. The following is what I did (in case its helpful to others):

phiilu commented 11 months ago

Thanks for the steps @Mohitsharma44

However, it seems like Mayastor is not supporting arm64, which is a bummer.

I tried to install the 2.5.0 helm chart release, but the etcd version they depend on is outdated and uses an older version of bitnami-shell which is not built for arm64:

Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  16m                  default-scheduler  Successfully assigned mayastor/mayastor-etcd-0 to n2-storage
  Normal   Pulled     15m (x5 over 16m)    kubelet            Container image "docker.io/bitnami/bitnami-shell:11-debian-11-r63" already present on machine
  Normal   Created    15m (x5 over 16m)    kubelet            Created container volume-permissions
  Normal   Started    15m (x5 over 16m)    kubelet            Started container volume-permissions
  Warning  BackOff    103s (x70 over 16m)  kubelet            Back-off restarting failed container volume-permissions in pod mayastor-etcd-0_mayastor(52798499-a0d7-42ef-acdd-5519341ed07f)
 ➜  ~ kubectl logs pods/mayastor-etcd-0 -c volume-permissions
exec /bin/bash: exec format error