storage cluster operator is not coming up | OKD 4.13 on Openstack

swogat commented 11 months ago

I am deploying okd 4.13 using openshiftSDN on openstack. The storage cluster operator is not coming up:

Error: Status: Conditions: Last Transition Time: 2023-11-21T21:25:09Z Message: OpenStackCinderCSIDriverOperatorCRDegraded: ConfigSyncDegraded: couldn't collect info about cloud availability zones: failed to create a compute client: Get "https://.com:13000/": net/http: TLS handshake timeout Reason: OpenStackCinderCSIDriverOperatorCR_ConfigSync_SyncError Status: True Type: Degraded Last Transition Time: 2023-11-21T21:21:10Z

oc get pods --all-namespaces gives the below O/P:

openshift-cloud-network-config-controller cloud-network-config-controller-847fff59fd-jb6rv 0/1 CrashLoopBackOff 124 (85s ago) 15h openshift-cluster-csi-drivers manila-csi-driver-operator-64c466b7d9-rc79g 1/1 Running 0 15h openshift-cluster-csi-drivers openstack-cinder-csi-driver-controller-5b8b6bdc65-trrj6 9/10 CrashLoopBackOff 6 (3m48s ago) 10m openshift-cluster-csi-drivers openstack-cinder-csi-driver-controller-5b8b6bdc65-xmn67 9/10 CrashLoopBackOff 11 (37s ago) 15h openshift-cluster-csi-drivers openstack-cinder-csi-driver-node-4g562 2/3 CrashLoopBackOff 11 (96s ago) 147m openshift-cluster-csi-drivers openstack-cinder-csi-driver-node-4gk7r 2/3 CrashLoopBackOff 11 (54s ago) 15h openshift-cluster-csi-drivers openstack-cinder-csi-driver-node-9lf8v 2/3 CrashLoopBackOff 11 (71s ago) 15h openshift-cluster-csi-drivers openstack-cinder-csi-driver-node-b8jpl 2/3 CrashLoopBackOff 11 (42s ago) 15h openshift-cluster-csi-drivers openstack-cinder-csi-driver-node-bvzfb 2/3 CrashLoopBackOff 11 (87s ago) 148m openshift-cluster-csi-drivers openstack-cinder-csi-driver-node-n9qw5 2/3 CrashLoopBackOff 11 (97s ago) 148m

[root@bastion ~]# oc logs cloud-network-config-controller-847fff59fd-jb6rv -n openshift-cloud-network-config-controller W1122 12:23:01.691854 1 client_config.go:618] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. I1122 12:23:01.693455 1 leaderelection.go:248] attempting to acquire leader lease openshift-cloud-network-config-controller/cloud-network-config-controller-lock... I1122 12:23:05.428860 1 leaderelection.go:258] successfully acquired lease openshift-cloud-network-config-controller/cloud-network-config-controller-lock I1122 12:23:05.429902 1 openstack.go:125] Custom CA bundle found at location '/kube-cloud-config/ca-bundle.pem' - reading certificate information F1122 12:25:20.753558 1 main.go:101] Error building cloud provider client, err: Get "https://.com:13000/": read tcp 10.128.0.3:34762->172.45.20.120:13000: read: connection reset by peer

[root@bastion Defaulted container Flag --nodeid I1122 12:28:00.116400 I1122 12:28:00.116480 I1122 12:28:00.116484 I1122 12:28:00.116497 I1122 12:28:00.116502 I1122 12:28:00.116505 I1122 12:28:00.116509 I1122 12:28:00.116512 I1122 12:28:00.116517 I1122 12:28:00.116520 I1122 12:28:00.116524 I1122 12:28:00.116534 I1122 12:28:00.116544 I1122 12:28:00.116548 I1122 12:28:00.116553 I1122 12:28:00.116556 I1122 12:28:00.116584 E1122 12:28:00.117537 W1122 12:28:00.117575 ~]# oc logs openstack-cinder-csi-driver-controller-5b8b6bdc65-trrj6 -n openshift-cluster-csi-drivers "csi-driver" out of: csi-driver, csi-provisioner, provisioner-kube-rbac-proxy, csi-attacher, attacher-kube-rbac-proxy, csi-resizer, resizer-kube-rbac-proxy, csi-snapshotter, snapshotter-kube-rbac-proxy, csi-liveness-probe has been deprecated, This flag would be removed in future. Currently, the value is ignored by the driver 1 driver.go:81] Driver: cinder.csi.openstack.org 1 driver.go:82] Driver version: 2.0.0@ 1 driver.go:83] CSI Spec version: 1.3.0 1 driver.go:115] Enabling controller service capability: LIST_VOLUMES 1 driver.go:115] Enabling controller service capability: CREATE_DELETE_VOLUME 1 driver.go:115] Enabling controller service capability: PUBLISH_UNPUBLISH_VOLUME 1 driver.go:115] Enabling controller service capability: CREATE_DELETE_SNAPSHOT 1 driver.go:115] Enabling controller service capability: LIST_SNAPSHOTS 1 driver.go:115] Enabling controller service capability: EXPAND_VOLUME 1 driver.go:115] Enabling controller service capability: CLONE_VOLUME 1 driver.go:115] Enabling controller service capability: LIST_VOLUMES_PUBLISHED_NODES 1 driver.go:115] Enabling controller service capability: GET_VOLUME 1 driver.go:125] Enabling volume access mode: SINGLE_NODE_WRITER 1 driver.go:135] Enabling node service capability: STAGE_UNSTAGE_VOLUME 1 driver.go:135] Enabling node service capability: EXPAND_VOLUME 1 driver.go:135] Enabling node service capability: GET_VOLUME_STATS 1 openstack.go:136] InitOpenStackProvider configFiles: [/etc/kubernetes/config/cloud.conf] 1 openstack.go:144] GetConfigFromFiles [/etc/kubernetes/config/cloud.conf] failed with error: unable to load clouds.yaml: no clouds.yml file found: file does not exist 1 main.go:105] Failed to GetOpenStackProvider: unable to load clouds.yaml: no clouds.yml file found: file does not exist

**Note: I had to manually create the below configmap:

openshift-cluster-csi-drivers cloud-conf 2 37m

containing the openstack CA certificate.**

Version

[root@bastion ~]# oc version Client Version: 4.13.0-0.okd-2023-09-30-084937 Kustomize Version: v4.5.7 Kubernetes Version: v1.26.4-3004+52589e6ce268bd-dirty

Steps To Reproduce

Deploy okd 4.13 using UPI method
Run the ansible command to deploy the worker nodes.

Current Result

the storage co is not coming up

Expected Result

the storage co should come up

openshift-bot commented 8 months ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot commented 7 months ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot commented 6 months ago

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen. Mark the issue as fresh by commenting /remove-lifecycle rotten. Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci[bot] commented 6 months ago

@openshift-bot: Closing this issue.

In response to [this](https://github.com/openshift/origin/issues/28413#issuecomment-2068272630): >Rotten issues close after 30d of inactivity. > >Reopen the issue by commenting `/reopen`. >Mark the issue as fresh by commenting `/remove-lifecycle rotten`. >Exclude this issue from closing again by commenting `/lifecycle frozen`. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

openshift / origin