stormshift / support

This repo should serve as a central source for reporting issues with stormshift
GNU General Public License v3.0
3 stars 0 forks source link

Cluster ocp1 CSI attach/detach fails becaue of : https://rhev.stormshift.... x509: certificate signed by unknown authority #67

Closed rbo closed 2 years ago

rbo commented 2 years ago

After Cert exchange of our rhev infrastructure (Ticket ID #56) the CSI doesn't work anymore.

rbo commented 2 years ago

✅ Trust bundle is well configured (oc get proxy -o yaml)

rbo commented 2 years ago
$ oc logs -n openshift-cluster-csi-drivers deploy/ovirt-csi-driver-controller -c csi-attacher | tail
Found 2 pods, using pod/ovirt-csi-driver-controller-7cd6cb7dbd-swr4r
I0215 10:12:08.759614       1 csi_handler.go:279] Detaching "csi-42f00fa7db867ea339464072e81d182e02b313e87561dd5fd80e492bbf0c3740"
I0215 10:12:08.800175       1 csi_handler.go:231] Error processing "csi-42f00fa7db867ea339464072e81d182e02b313e87561dd5fd80e492bbf0c3740": failed to detach: rpc error: code = Unknown desc = failed finding disk attachments: failed to get disk attachment by disk 10a6f4ab-59c9-464c-8ad7-8108880a974b for VM 2319536d-8e41-4f3a-b375-e6edefeb8316, error: Post "https://rhev.stormshift.coe.muc.redhat.com/ovirt-engine/sso/oauth/token": x509: certificate signed by unknown authority
I0215 10:15:41.139750       1 csi_handler.go:279] Detaching "csi-42f00fa7db867ea339464072e81d182e02b313e87561dd5fd80e492bbf0c3740"
I0215 10:15:41.206202       1 csi_handler.go:231] Error processing "csi-42f00fa7db867ea339464072e81d182e02b313e87561dd5fd80e492bbf0c3740": failed to detach: rpc error: code = Unknown desc = failed finding disk attachments: failed to get disk attachment by disk 10a6f4ab-59c9-464c-8ad7-8108880a974b for VM 2319536d-8e41-4f3a-b375-e6edefeb8316, error: Post "https://rhev.stormshift.coe.muc.redhat.com/ovirt-engine/sso/oauth/token": x509: certificate signed by unknown authority
I0215 10:20:41.209763       1 csi_handler.go:279] Detaching "csi-42f00fa7db867ea339464072e81d182e02b313e87561dd5fd80e492bbf0c3740"
I0215 10:20:41.294554       1 csi_handler.go:231] Error processing "csi-42f00fa7db867ea339464072e81d182e02b313e87561dd5fd80e492bbf0c3740": failed to detach: rpc error: code = Unknown desc = failed finding disk attachments: failed to get disk attachment by disk 10a6f4ab-59c9-464c-8ad7-8108880a974b for VM 2319536d-8e41-4f3a-b375-e6edefeb8316, error: Post "https://rhev.stormshift.coe.muc.redhat.com/ovirt-engine/sso/oauth/token": x509: certificate signed by unknown authority
I0215 10:22:08.761003       1 csi_handler.go:279] Detaching "csi-42f00fa7db867ea339464072e81d182e02b313e87561dd5fd80e492bbf0c3740"
I0215 10:22:08.895899       1 csi_handler.go:231] Error processing "csi-42f00fa7db867ea339464072e81d182e02b313e87561dd5fd80e492bbf0c3740": failed to detach: rpc error: code = Unknown desc = failed finding disk attachments: failed to get disk attachment by disk 10a6f4ab-59c9-464c-8ad7-8108880a974b for VM 2319536d-8e41-4f3a-b375-e6edefeb8316, error: Post "https://rhev.stormshift.coe.muc.redhat.com/ovirt-engine/sso/oauth/token": x509: certificate signed by unknown authority
I0215 10:25:41.295317       1 csi_handler.go:279] Detaching "csi-42f00fa7db867ea339464072e81d182e02b313e87561dd5fd80e492bbf0c3740"
I0215 10:25:41.364480       1 csi_handler.go:231] Error processing "csi-42f00fa7db867ea339464072e81d182e02b313e87561dd5fd80e492bbf0c3740": failed to detach: rpc error: code = Unknown desc = failed finding disk attachments: failed to get disk attachment by disk 10a6f4ab-59c9-464c-8ad7-8108880a974b for VM 2319536d-8e41-4f3a-b375-e6edefeb8316, error: Post "https://rhev.stormshift.coe.muc.redhat.com/ovirt-engine/sso/oauth/token": x509: certificate signed by unknown authority
$

Problem is the cloud provider settings:

$ oc get secret ovirt-credentials -n  openshift-cluster-csi-drivers  -o jsonpath="{.data.ovirt_ca_bundle}" | base64 -d  | openssl x509 -noout -issuer -subject
issuer=C = US, O = stormshift.coe.muc.redhat.com, CN = rhev.stormshift.coe.muc.redhat.com.26039
subject=C = US, O = stormshift.coe.muc.redhat.com, CN = rhev.stormshift.coe.muc.redhat.com.26039
$

$ echo | openssl s_client -connect rhev.stormshift.coe.muc.redhat.com:443 2>/dev/null| openssl x509 -noout -subject -issuer
subject=O = Red Hat, OU = SolutionArchitectsDach, CN = *.stormshift.coe.muc.redhat.com
issuer=O = Red Hat, OU = prod, CN = Certificate Authority
$

I have to update cloud-provider ca bundle

rbo commented 2 years ago
$ BUNDLE=$(oc get cm user-ca-bundle -n openshift-config -o jsonpath="{.data.ca-bundle\.crt}" | base64 -w0 )
$ kubectl patch secret -n kube-system ovirt-credentials   --type='json' -p="[{\"op\" : \"replace\" ,\"path\" : \"/data/ovirt_ca_bundle\" ,\"value\" : \"$BUNDLE\"}]"
secret/ovirt-credentials patched
$

$ oc get secret ovirt-credentials -n  openshift-cluster-csi-drivers  -o jsonpath="{.data.ovirt_ca_bundle}" | base64 -d  | openssl x509 -noout -issuer -subject
issuer=C = US, ST = North Carolina, L = Raleigh, O = "Red Hat, Inc.", OU = Red Hat IT, CN = Red Hat IT Root CA, emailAddress = infosec@redhat.com
subject=C = US, ST = North Carolina, L = Raleigh, O = "Red Hat, Inc.", OU = Red Hat IT, CN = Red Hat IT Root CA, emailAddress = infosec@redhat.com
$

$ oc delete pods -n openshift-cluster-csi-drivers -l app=ovirt-csi-driver-controller --wait=false
pod "ovirt-csi-driver-controller-7cd6cb7dbd-89g7x" deleted
pod "ovirt-csi-driver-controller-7cd6cb7dbd-8jqmb" deleted

$ oc logs -n openshift-cluster-csi-drivers deploy/ovirt-csi-driver-controller -c csi-attacher  -f
Found 2 pods, using pod/ovirt-csi-driver-controller-7cd6cb7dbd-wzhzq
I0215 10:40:45.894832       1 main.go:99] Version: v4.9.0-202111151318.p0.g0a1737c.assembly.stream-0-gd002fb1-dirty
I0215 10:40:48.103344       1 common.go:111] Probing CSI driver for readiness
I0215 10:40:48.185931       1 main.go:155] CSI driver name: "csi.ovirt.org"
I0215 10:40:48.186623       1 main.go:181] ServeMux listening at "localhost:8203"
I0215 10:40:48.191908       1 main.go:230] CSI driver supports ControllerPublishUnpublish, using real CSI handler
I0215 10:40:48.197642       1 leaderelection.go:248] attempting to acquire leader lease openshift-cluster-csi-drivers/external-attacher-leader-csi-ovirt-org...
I0215 10:40:48.407360       1 leaderelection.go:258] successfully acquired lease openshift-cluster-csi-drivers/external-attacher-leader-csi-ovirt-org
I0215 10:40:48.408393       1 leader_election.go:205] became leader, starting
I0215 10:40:48.408515       1 controller.go:128] Starting CSI attacher
I0215 10:40:48.509847       1 csi_handler.go:279] Detaching "csi-42f00fa7db867ea339464072e81d182e02b313e87561dd5fd80e492bbf0c3740"
I0215 10:40:48.680278       1 csi_handler.go:587] Detached "csi-42f00fa7db867ea339464072e81d182e02b313e87561dd5fd80e492bbf0c3740"
I0215 10:40:48.843786       1 csi_handler.go:279] Detaching "csi-42f00fa7db867ea339464072e81d182e02b313e87561dd5fd80e492bbf0c3740"
I0215 10:40:48.932696       1 csi_handler.go:587] Detached "csi-42f00fa7db867ea339464072e81d182e02b313e87561dd5fd80e492bbf0c3740"
I0215 10:40:48.963317       1 csi_handler.go:286] Failed to save detach error to "csi-42f00fa7db867ea339464072e81d182e02b313e87561dd5fd80e492bbf0c3740": volumeattachments.storage.k8s.io "csi-42f00fa7db867ea339464072e81d182e02b313e87561dd5fd80e492bbf0c3740" not found
I0215 10:40:48.963470       1 csi_handler.go:231] Error processing "csi-42f00fa7db867ea339464072e81d182e02b313e87561dd5fd80e492bbf0c3740": failed to detach: could not mark as detached: volumeattachments.storage.k8s.io "csi-42f00fa7db867ea339464072e81d182e02b313e87561dd5fd80e492bbf0c3740" not found
I0215 10:40:53.957529       1 csi_handler.go:248] Attaching "csi-ce8c06362d90cfdc0d2a48930b7f4b8c90670ad5136b84baf7c1b0e355400159"
I0215 10:40:54.727871       1 csi_handler.go:261] Attached "csi-ce8c06362d90cfdc0d2a48930b7f4b8c90670ad5136b84baf7c1b0e355400159"
I0215 10:40:54.953165       1 csi_handler.go:248] Attaching "csi-ce8c06362d90cfdc0d2a48930b7f4b8c90670ad5136b84baf7c1b0e355400159"
I0215 10:40:55.060104       1 csi_handler.go:261] Attached "csi-ce8c06362d90cfdc0d2a48930b7f4b8c90670ad5136b84baf7c1b0e355400159"
^C
$
rbo commented 2 years ago

SOLVED

github-actions[bot] commented 2 years ago

Heads up @cluster/rhacm-admin - the "cluster/rhacm" label was applied to this issue.

rbo commented 2 years ago
MountVolume.MountDevice failed for volume "pvc-ac0c836d-e908-4a9d-8e6e-745bf505a01f" : rpc error: code = Unknown desc = failed finding disk attachments, error: failed to get disk attachment by disk 10a6f4ab-59c9-464c-8ad7-8108880a974b for VM 5579854e-5f70-4148-9d4b-24fcf7b46197, error: Post "https://rhev.stormshift.coe.muc.redhat.com/ovirt-engine/sso/oauth/token": x509: certificate signed by unknown authority

Not solved at all, it's hard to find all components

rbo commented 2 years ago
$ oc delete pods -n openshift-cluster-csi-drivers -l app=ovirt-csi-driver-node --wait=false
pod "ovirt-csi-driver-node-997nn" deleted
pod "ovirt-csi-driver-node-cb27c" deleted
pod "ovirt-csi-driver-node-fjr8p" deleted
pod "ovirt-csi-driver-node-hz64p" deleted
pod "ovirt-csi-driver-node-n4gsl" deleted
rbo commented 2 years ago

My pods start again, let the ticket open to find all components and restart all pods :-( Maybe restarting the entire cluster is the easiest way..

rbo commented 2 years ago

Source:

$ oc get secret ovirt-credentials -n openshift-cluster-csi-drivers -o jsonpath="{.data.ovirt_ca_bundle}" | base64 -d | openssl x509 -noout -issuer -subject issuer=C = US, ST = North Carolina, L = Raleigh, O = "Red Hat, Inc.", OU = Red Hat IT, CN = Red Hat IT Root CA, emailAddress = infosec@redhat.com subject=C = US, ST = North Carolina, L = Raleigh, O = "Red Hat, Inc.", OU = Red Hat IT, CN = Red Hat IT Root CA, emailAddress = infosec@redhat.com $ oc get secret ovirt-credentials -n openshift-machine-api -o jsonpath="{.data.ovirt_ca_bundle}" | base64 -d | openssl x509 -noout -issuer -subject issuer=C = US, ST = North Carolina, L = Raleigh, O = "Red Hat, Inc.", OU = Red Hat IT, CN = Red Hat IT Root CA, emailAddress = infosec@redhat.com subject=C = US, ST = North Carolina, L = Raleigh, O = "Red Hat, Inc.", OU = Red Hat IT, CN = Red Hat IT Root CA, emailAddress = infosec@redhat.com $

Delete all pods to update secrets:

$ oc delete pods -n openshift-machine-api --all --wait=false pod "cluster-autoscaler-operator-584c78fbd8-lbgfl" deleted pod "cluster-baremetal-operator-5f4ccb4899-hqpvj" deleted pod "machine-api-controllers-7c8cc5b994-x4w4m" deleted pod "machine-api-operator-648f59c644-x9qfj" deleted

$ oc delete pods -n openshift-cluster-csi-drivers --all --wait=false ...


Force kubecontrollermanager rollout:

$ oc patch kubecontrollermanager cluster -p='{"spec": {"forceRedeploymentReason": "recovery-'"$( date )"'"}}' --type=merge kubecontrollermanager.operator.openshift.io/cluster patched $

DanielFroehlich commented 2 years ago

LGTM