intel / pmem-csi

Persistent Memory Container Storage Interface Driver
Apache License 2.0
164 stars 55 forks source link

Error while deploying Pmem-csi on SUSE Caas (Kubernetes 1.16) #710

Closed shantanupagare closed 4 years ago

shantanupagare commented 4 years ago

Hi,

We are trying to deploy pmem-csi on SUSE CaaS4 (kubernetes 1.16), while deploying using source provided we are getting the below mentioned error.

######## Warning FailedCreate 41s (x15 over 2m3s) daemonset-controller Error creating: pods "pmem-csi-node-" is forbidden: unable to validate against any pod security policy: [spec.securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used spec.initContainers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed spec.initContainers[1].securityContext.privileged: Invalid value: true: Privileged containers are not allowed spec.containers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed] #########

pohly commented 4 years ago

Which version of PMEM-CSI are you deploying and how?

securityContext.hostNetwork is something that shouldn't be needed. We use it only for testing.

But securityContext.privileged is needed by the CSI driver. Otherwise it cannot manage the underlying hardware. You'll have to find out how and admin can install privileged containers under SUSE CaaS4.

shantanupagare commented 4 years ago

Hi,

Now we are using the master branch and now the error for hostNetwork is not used and still getting the below-mentioned error.

Warning FailedCreate 37s (x16 over 3m21s) daemonset-controller Error creating: pods "pmem-csi-node-" is forbidden: unable to validate against any pod security policy: [spec.containers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed]

shantanupagare commented 4 years ago

We are now able to get those pods up and running. After creating new security policy, clusterrole and clusterrolebinding so that privileged containers can be started.

Now we are getting errors in containers mentioned below. ########### I0822 23:18:48.160300 1 grpc.go:153] VerifyPeerCertificate: CN=pmem-registry I0822 23:18:48.161590 1 tracing.go:19] GRPC call: /csi.v1.Controller/CreateVolume E0822 23:18:48.170286 1 tracing.go:23] GRPC error: rpc error: code = Internal desc = Node CreateVolume: device creation failed: vgs failure: exit status 5 E0822 23:18:48.238292 1 tracing.go:23] GRPC error: rpc error: code = Internal desc = Node CreateVolume: device creation failed: vgs failure: exit status 5 ##########

pohly commented 4 years ago

@avalluri can you help here with figuring out how or why LVM fails?

Should there be error output that explains the failure?

avalluri commented 4 years ago

We are now able to get those pods up and running. After creating new security policy, clusterrole and clusterrolebinding so that privileged containers can be started.

Now we are getting errors in containers mentioned below. ########### I0822 23:18:48.160300 1 grpc.go:153] VerifyPeerCertificate: CN=pmem-registry I0822 23:18:48.161590 1 tracing.go:19] GRPC call: /csi.v1.Controller/CreateVolume E0822 23:18:48.170286 1 tracing.go:23] GRPC error: rpc error: code = Internal desc = Node CreateVolume: device creation failed: vgs failure: exit status 5 E0822 23:18:48.238292 1 tracing.go:23] GRPC error: rpc error: code = Internal desc = Node CreateVolume: device creation failed: vgs failure: exit status 5 ##########

@shantanupagare Could you provide full logs of 'pmem-driver' container on the node pod, that should have driver initialization logs.

seanporterwork commented 4 years ago

@avalluri Did you mean to tag someone else (@shantanupagare perhaps?) I don't know anything about this issue...

avalluri commented 4 years ago

@avalluri Did you mean to tag someone else (@shantanupagare perhaps?) I don't know anything about this issue...

Oops, tagged the wrong person. My apologies @seanporterwork.

shantanupagare commented 4 years ago

Hi

pmem-csi-controller pods logs:- ####### I0830 13:48:08.767052 1 main.go:66] Version: v0.5.0-rc1-732-gacb46b31 I0830 13:48:08.779543 1 server.go:51] Listening for connections on address: /csi/csi-controller.sock I0830 13:48:08.779619 1 server.go:51] Listening for connections on address: [::]:10000 I0830 13:48:08.779714 1 pmem-csi-driver.go:251] Prometheus endpoint started at https://[::]:10010/metrics I0830 13:48:09.162939 1 tracing.go:19] GRPC call: /csi.v1.Identity/Probe I0830 13:48:09.163586 1 tracing.go:19] GRPC call: /csi.v1.Identity/GetPluginInfo I0830 13:48:09.164158 1 tracing.go:19] GRPC call: /csi.v1.Identity/GetPluginCapabilities I0830 13:48:09.164857 1 tracing.go:19] GRPC call: /csi.v1.Controller/ControllerGetCapabilities

pmem-csi-controller pods is up and running.

The problem is with the daemonset init container and getting the below-mentioned error. ######### Error creating: pods "pmem-csi-node-" is forbidden: unable to validate against any pod security policy: [spec.containers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed] #########

To overcome this issue I have created a new pod security policy that allows all the root privileges and then al the pods came up without issue but after the my Kubernetes cilium pods were unable to communicate with each other os I have to revert it back to original pod security policy.

pohly commented 4 years ago

I'm sorry, but we can't help with configuring the security policy in SUSE CaaS4. Perhaps ask SUSE?

shantanupagare commented 4 years ago

Hi, Can you help me with where we can give the policy which I already have to use while deploying? So it will use custom policy while deployment and everything will work.

Let's say my policy name is "suse.caasp.psp.privileged"

pohly commented 4 years ago

I'm not familiar with how people use PSP, but having to change the policy only while rolling out PMEM-CSI feels wrong to me.

shantanupagare commented 4 years ago

The policy is already available with the Kubernetes cluster for privileged pods, So we can use the same policy and deploy the PMEM-CSI.

pohly commented 4 years ago

@shantanupagare do you agree that we can close this issue?