kubernetes / cloud-provider-openstack

Apache License 2.0
616 stars 601 forks source link

[cinder-csi-plugin] panic: runtime error: invalid memory address or nil pointer dereference #1684

Closed arunabhabanerjee closed 2 years ago

arunabhabanerjee commented 2 years ago

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

What happened: cinder-csi-plugin container is crashing with the below error: panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x98cb69]

What you expected to happen: cinder-csi-plugin container must be up and running

How to reproduce it: Deploy in OpenShift platform 4.6.48

Anything else we need to know?: The entire log:

oc logs csi-cinder-controllerplugin-76c97b776-mfrbx -c cinder-csi-plugin I1106 05:04:04.644022 1 driver.go:73] Driver: cinder.csi.openstack.org I1106 05:04:04.644076 1 driver.go:74] Driver version: 1.3.2@latest I1106 05:04:04.644082 1 driver.go:75] CSI Spec version: 1.3.0 I1106 05:04:04.644089 1 driver.go:104] Enabling controller service capability: LIST_VOLUMES I1106 05:04:04.644094 1 driver.go:104] Enabling controller service capability: CREATE_DELETE_VOLUME I1106 05:04:04.644098 1 driver.go:104] Enabling controller service capability: PUBLISH_UNPUBLISH_VOLUME I1106 05:04:04.644125 1 driver.go:104] Enabling controller service capability: CREATE_DELETE_SNAPSHOT I1106 05:04:04.644130 1 driver.go:104] Enabling controller service capability: LIST_SNAPSHOTS I1106 05:04:04.644134 1 driver.go:104] Enabling controller service capability: EXPAND_VOLUME I1106 05:04:04.644138 1 driver.go:104] Enabling controller service capability: CLONE_VOLUME I1106 05:04:04.644142 1 driver.go:104] Enabling controller service capability: LIST_VOLUMES_PUBLISHED_NODES I1106 05:04:04.644147 1 driver.go:116] Enabling volume access mode: SINGLE_NODE_WRITER I1106 05:04:04.644152 1 driver.go:126] Enabling node service capability: STAGE_UNSTAGE_VOLUME I1106 05:04:04.644160 1 driver.go:126] Enabling node service capability: EXPAND_VOLUME I1106 05:04:04.644165 1 driver.go:126] Enabling node service capability: GET_VOLUME_STATS panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x98cb69]

goroutine 1 [running]: k8s.io/cloud-provider-openstack/pkg/client.ReadClouds(0xc0002781e0, 0x15, 0xc000134960) /home/zuul/src/k8s.io/cloud-provider-openstack/pkg/client/client.go:215 +0xa9 k8s.io/cloud-provider-openstack/pkg/csi/cinder/openstack.GetConfigFromFiles(0xc000311390, 0x1, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...) /home/zuul/src/k8s.io/cloud-provider-openstack/pkg/csi/cinder/openstack/openstack.go:119 +0x25e k8s.io/cloud-provider-openstack/pkg/csi/cinder/openstack.CreateOpenStackProvider(0x0, 0x0, 0x0, 0x0) /home/zuul/src/k8s.io/cloud-provider-openstack/pkg/csi/cinder/openstack/openstack.go:143 +0x7c k8s.io/cloud-provider-openstack/pkg/csi/cinder/openstack.GetOpenStackProvider(0xc000311390, 0x1, 0x1, 0xa) /home/zuul/src/k8s.io/cloud-provider-openstack/pkg/csi/cinder/openstack/openstack.go:195 +0x3d main.handle() /home/zuul/src/k8s.io/cloud-provider-openstack/cmd/cinder-csi-plugin/main.go:106 +0x9f main.main.func2(0xc000287080, 0xc00032ac60, 0x0, 0x3) /home/zuul/src/k8s.io/cloud-provider-openstack/cmd/cinder-csi-plugin/main.go:71 +0x25 github.com/spf13/cobra.(Command).execute(0xc000287080, 0xc000188050, 0x3, 0x3, 0xc000287080, 0xc000188050) /home/zuul/pkg/mod/github.com/spf13/cobra@v1.1.1/command.go:854 +0x2c2 github.com/spf13/cobra.(Command).ExecuteC(0xc000287080, 0x10d1da0, 0xbeb15a, 0xa) /home/zuul/pkg/mod/github.com/spf13/cobra@v1.1.1/command.go:958 +0x375 github.com/spf13/cobra.(*Command).Execute(...) /home/zuul/pkg/mod/github.com/spf13/cobra@v1.1.1/command.go:895 main.main() /home/zuul/src/k8s.io/cloud-provider-openstack/cmd/cinder-csi-plugin/main.go:93 +0x352

Environment:

jichenjc commented 2 years ago

Technically OCP related issue should not be opened here ..

but I believe you might suffer issue here https://github.com/kubernetes/cloud-provider-openstack/blob/master/pkg/client/client.go#L215 which means likely the cloud.AuthInfo or cloud is nil and lead to issue

https://github.com/gophercloud/utils/blob/master/openstack/clientconfig/requests.go#L194 checks this and return error but I am very curious why

we have such check,

if err != nil && err.Error() != "unable to load clouds.yaml: no clouds.yaml file found" {
                return err
        }

which means we will have a nil returned from and we are likley to use it https://github.com/gophercloud/utils/blob/master/openstack/clientconfig/requests.go#L194

ramineni commented 2 years ago

@arunabhabanerjee Could you test if this change solves your issue https://github.com/kubernetes/cloud-provider-openstack/pull/1685 .

ramineni commented 2 years ago

@arunabhabanerjee closing the issue as no response. Feel free to reopen in case the problem persists after applying latest change.

ramineni commented 2 years ago

/close

k8s-ci-robot commented 2 years ago

@ramineni: Closing this issue.

In response to [this](https://github.com/kubernetes/cloud-provider-openstack/issues/1684#issuecomment-979008487): >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
arunabhabanerjee commented 2 years ago

We encountered this issue on the OpenStack platform when we are not using the security group. I am really sorry for my delayed response. With the security group, it just works fine.

jichenjc commented 2 years ago

curious why security group impact the functions here.. I image it might be related to other env issue but with limit info, nothing can be done..