openebs / mayastor

Dynamically provision Stateful Persistent Replicated Cluster-wide Fabric Volumes & Filesystems for Kubernetes that is provisioned from an optimized NVME SPDK backend data storage stack.
Apache License 2.0
750 stars 109 forks source link

Csi resizer oom killed #1774

Open marcinpro1 opened 1 day ago

marcinpro1 commented 1 day ago

Csi resizer getting oom killer, no matter how much memory i shove into the container.

No valuable information in log.

Maybe linked with this? https://github.com/kubernetes-sigs/aws-ebs-csi-driver/issues/1248

Would like to now if i its necessary for this container to work if I won't use disk resize. Also is there anywhere I should look for more details about this issue in my cluster?

tiagolobocastro commented 1 day ago

Would you be able to update the log to trace? This can be done by updating helm vars: --set csi.node.logLevel=trace And then reproduce again and share a support bundle here? https://openebs.io/docs/user-guides/replicated-storage-user-guide/replicated-pv-mayastor/advanced-operations/supportability

marcinpro1 commented 23 hours ago

Here you go: openebs_dump.tar.gz some things were redacted

tiagolobocastro commented 22 hours ago

Would like to now if i its necessary for this container to work if I won't use disk resize.

Not sure, I'd say probably not, but since we are exposing resize feature will this cause an issue? You could try removing the csi-resize sidecar and see if that works.

If so would be open for adding a helm var to enable/disable the resize... however as per the bug which you have mentioned it seems if online resizing is supported (which it is) we should be setting --handle-volume-inuse-error=false otherwise the resizer is watching pods which consumes memory. This should fix the issue for everyone without having to modify anything.

Would you be willing to raise a PR for that?