openshift / origin

Conformance test suite for OpenShift
http://www.openshift.org
Apache License 2.0
8.49k stars 4.7k forks source link

storage cluster operator is not coming up | OKD 4.13 on Openstack #28413

Closed swogat closed 6 months ago

swogat commented 11 months ago

I am deploying okd 4.13 using openshiftSDN on openstack. The storage cluster operator is not coming up:

Error: Status: Conditions: Last Transition Time: 2023-11-21T21:25:09Z Message: OpenStackCinderCSIDriverOperatorCRDegraded: ConfigSyncDegraded: couldn't collect info about cloud availability zones: failed to create a compute client: Get "https://.com:13000/": net/http: TLS handshake timeout Reason: OpenStackCinderCSIDriverOperatorCR_ConfigSync_SyncError Status: True Type: Degraded Last Transition Time: 2023-11-21T21:21:10Z

oc get pods --all-namespaces gives the below O/P:

openshift-cloud-network-config-controller cloud-network-config-controller-847fff59fd-jb6rv 0/1 CrashLoopBackOff 124 (85s ago) 15h openshift-cluster-csi-drivers manila-csi-driver-operator-64c466b7d9-rc79g 1/1 Running 0 15h openshift-cluster-csi-drivers openstack-cinder-csi-driver-controller-5b8b6bdc65-trrj6 9/10 CrashLoopBackOff 6 (3m48s ago) 10m openshift-cluster-csi-drivers openstack-cinder-csi-driver-controller-5b8b6bdc65-xmn67 9/10 CrashLoopBackOff 11 (37s ago) 15h openshift-cluster-csi-drivers openstack-cinder-csi-driver-node-4g562 2/3 CrashLoopBackOff 11 (96s ago) 147m openshift-cluster-csi-drivers openstack-cinder-csi-driver-node-4gk7r 2/3 CrashLoopBackOff 11 (54s ago) 15h openshift-cluster-csi-drivers openstack-cinder-csi-driver-node-9lf8v 2/3 CrashLoopBackOff 11 (71s ago) 15h openshift-cluster-csi-drivers openstack-cinder-csi-driver-node-b8jpl 2/3 CrashLoopBackOff 11 (42s ago) 15h openshift-cluster-csi-drivers openstack-cinder-csi-driver-node-bvzfb 2/3 CrashLoopBackOff 11 (87s ago) 148m openshift-cluster-csi-drivers openstack-cinder-csi-driver-node-n9qw5 2/3 CrashLoopBackOff 11 (97s ago) 148m

[root@bastion ~]# oc logs cloud-network-config-controller-847fff59fd-jb6rv -n openshift-cloud-network-config-controller W1122 12:23:01.691854 1 client_config.go:618] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. I1122 12:23:01.693455 1 leaderelection.go:248] attempting to acquire leader lease openshift-cloud-network-config-controller/cloud-network-config-controller-lock... I1122 12:23:05.428860 1 leaderelection.go:258] successfully acquired lease openshift-cloud-network-config-controller/cloud-network-config-controller-lock I1122 12:23:05.429902 1 openstack.go:125] Custom CA bundle found at location '/kube-cloud-config/ca-bundle.pem' - reading certificate information F1122 12:25:20.753558 1 main.go:101] Error building cloud provider client, err: Get "https://.com:13000/": read tcp 10.128.0.3:34762->172.45.20.120:13000: read: connection reset by peer

[root@bastion ~]# oc logs openstack-cinder-csi-driver-controller-5b8b6bdc65-trrj6 -n openshift-cluster-csi-drivers Defaulted container "csi-driver" out of: csi-driver, csi-provisioner, provisioner-kube-rbac-proxy, csi-attacher, attacher-kube-rbac-proxy, csi-resizer, resizer-kube-rbac-proxy, csi-snapshotter, snapshotter-kube-rbac-proxy, csi-liveness-probe Flag --nodeid has been deprecated, This flag would be removed in future. Currently, the value is ignored by the driver I1122 12:28:00.116400 1 driver.go:81] Driver: cinder.csi.openstack.org I1122 12:28:00.116480 1 driver.go:82] Driver version: 2.0.0@ I1122 12:28:00.116484 1 driver.go:83] CSI Spec version: 1.3.0 I1122 12:28:00.116497 1 driver.go:115] Enabling controller service capability: LIST_VOLUMES I1122 12:28:00.116502 1 driver.go:115] Enabling controller service capability: CREATE_DELETE_VOLUME I1122 12:28:00.116505 1 driver.go:115] Enabling controller service capability: PUBLISH_UNPUBLISH_VOLUME I1122 12:28:00.116509 1 driver.go:115] Enabling controller service capability: CREATE_DELETE_SNAPSHOT I1122 12:28:00.116512 1 driver.go:115] Enabling controller service capability: LIST_SNAPSHOTS I1122 12:28:00.116517 1 driver.go:115] Enabling controller service capability: EXPAND_VOLUME I1122 12:28:00.116520 1 driver.go:115] Enabling controller service capability: CLONE_VOLUME I1122 12:28:00.116524 1 driver.go:115] Enabling controller service capability: LIST_VOLUMES_PUBLISHED_NODES I1122 12:28:00.116534 1 driver.go:115] Enabling controller service capability: GET_VOLUME I1122 12:28:00.116544 1 driver.go:125] Enabling volume access mode: SINGLE_NODE_WRITER I1122 12:28:00.116548 1 driver.go:135] Enabling node service capability: STAGE_UNSTAGE_VOLUME I1122 12:28:00.116553 1 driver.go:135] Enabling node service capability: EXPAND_VOLUME I1122 12:28:00.116556 1 driver.go:135] Enabling node service capability: GET_VOLUME_STATS I1122 12:28:00.116584 1 openstack.go:136] InitOpenStackProvider configFiles: [/etc/kubernetes/config/cloud.conf] E1122 12:28:00.117537 1 openstack.go:144] GetConfigFromFiles [/etc/kubernetes/config/cloud.conf] failed with error: unable to load clouds.yaml: no clouds.yml file found: file does not exist W1122 12:28:00.117575 1 main.go:105] Failed to GetOpenStackProvider: unable to load clouds.yaml: no clouds.yml file found: file does not exist

**Note: I had to manually create the below configmap:

openshift-cluster-csi-drivers cloud-conf 2 37m

containing the openstack CA certificate.**

Version

[root@bastion ~]# oc version Client Version: 4.13.0-0.okd-2023-09-30-084937 Kustomize Version: v4.5.7 Kubernetes Version: v1.26.4-3004+52589e6ce268bd-dirty

Steps To Reproduce
  1. Deploy okd 4.13 using UPI method
  2. Run the ansible command to deploy the worker nodes.
Current Result

the storage co is not coming up

Expected Result

the storage co should come up

openshift-bot commented 8 months ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot commented 7 months ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot commented 6 months ago

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen. Mark the issue as fresh by commenting /remove-lifecycle rotten. Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci[bot] commented 6 months ago

@openshift-bot: Closing this issue.

In response to [this](https://github.com/openshift/origin/issues/28413#issuecomment-2068272630): >Rotten issues close after 30d of inactivity. > >Reopen the issue by commenting `/reopen`. >Mark the issue as fresh by commenting `/remove-lifecycle rotten`. >Exclude this issue from closing again by commenting `/lifecycle frozen`. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.