red-hat-storage / ocs-ci

https://ocs-ci.readthedocs.io/en/latest/
MIT License
108 stars 168 forks source link

[Provider mode] metallb TLS issue. Clients cant install ODF operator #10802

Open DanielOsypenko opened 3 weeks ago

DanielOsypenko commented 3 weeks ago

Behavior: Clients (all deployed hosted client clusters) can not install ODF operators and pull-image timeout expires with msg

error using catalogsource openshift-marketplace/ocs-catalogsource: error encountered while listing bundles: rpc error: code = DeadlineExceeded desc = context deadline exceeded, error using catalogsource openshift-marketplace/community-operators: error encountered while listing bundles: rpc error: code = DeadlineExceeded desc = context deadline exceeded, error using catalogsource openshift-marketplace/redhat-operators: error encountered while listing bundles: rpc error: code = DeadlineExceeded desc = context deadline exceeded

Initial investigation: catalog source on Client in 'Ready' status

MultiClusterHub is running

oc get MultiClusterHub -A
NAMESPACE                 NAME              STATUS    AGE
open-cluster-management   multiclusterhub   Running   12d

IPAddressPool created. TODO: check addresses are still reserved for us cc @dahorak - https://ibm-systems-storage.slack.com/archives/C06E08SNVC7/p1730810762549349?thread_ts=1730792170.488219&cid=C06E08SNVC7

l2advertisement looks Ok.

oc get l2advertisement -o yaml
apiVersion: v1
items:
- apiVersion: metallb.io/v1beta1
  kind: L2Advertisement
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"metallb.io/v1beta1","kind":"L2Advertisement","metadata":{"annotations":{},"name":"l2advertisement","namespace":"metallb-system"},"spec":{"ipAddressPools":["metallb-addresspool"]}}
    creationTimestamp: "2024-10-29T15:56:13Z"
    generation: 1
    name: l2advertisement
    namespace: metallb-system
    resourceVersion: "9216725"
    uid: efce4034-8dce-48be-b45c-f2fbcb05693a
  spec:
    ipAddressPools:
    - metallb-addresspool
kind: List
metadata:
    resourceVersion: ""

logs from metallb-operator-webhook-server

{"level":"info","ts":"2024-10-30T05:53:51Z","logger":"controller-runtime.certwatcher","msg":"Updated current TLS certificate","stacktrace":"sigs.k8s.io/controller-runtime/pkg/certwatcher.(*CertWatcher).ReadCertificate\n\t/metallb/vendor/sigs.k8s.io/controller-runtime/pkg/certwatcher/certwatcher.go:161\nsigs.k8s.io/controller-runtime/pkg/certwatcher.New\n\t/metallb/vendor/sigs.k8s.io/controller-runtime/pkg/certwatcher/certwatcher.go:62\nsigs.k8s.io/controller-runtime/pkg/webhook.(*DefaultServer).Start\n\t/metallb/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/server.go:207\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\t/metallb/vendor/sigs.k8s.io/controller-runtime/pkg/manager/runnable_group.go:223"}
{"level":"info","ts":"2024-10-30T05:53:51Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/validate-metallb-io-v1beta1-bfdprofile","stacktrace":"sigs.k8s.io/controller-runtime/pkg/webhook.(*DefaultServer).Register\n\t/metallb/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/server.go:183\ngo.universe.tf/metallb/internal/k8s/webhooks/webhookv1beta1.(*BFDProfileValidator).SetupWebhookWithManager\n\t/metallb/internal/k8s/webhooks/webhookv1beta1/bfdprofile_webhook.go:40\ngo.universe.tf/metallb/internal/k8s.enableWebhook\n\t/metallb/internal/k8s/webhook.go:90\ngo.universe.tf/metallb/internal/k8s.New.func3\n\t/metallb/internal/k8s/k8s.go:300"}
{"level":"info","ts":"2024-10-30T05:53:51Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/convert","stacktrace":"sigs.k8s.io/controller-runtime/pkg/webhook.(*DefaultServer).Register\n\t/metallb/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/server.go:183\ngo.universe.tf/metallb/internal/k8s.enableWebhook\n\t/metallb/internal/k8s/webhook.go:96\ngo.universe.tf/metallb/internal/k8s.New.func3\n\t/metallb/internal/k8s/k8s.go:300"}
{"level":"info","ts":"2024-10-30T05:53:51Z","logger":"controller-runtime.certwatcher","msg":"Starting certificate watcher","stacktrace":"sigs.k8s.io/controller-runtime/pkg/certwatcher.(*CertWatcher).Start\n\t/metallb/vendor/sigs.k8s.io/controller-runtime/pkg/certwatcher/certwatcher.go:115\nsigs.k8s.io/controller-runtime/pkg/webhook.(*DefaultServer).Start.func1\n\t/metallb/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/server.go:214"}
{"level":"info","ts":"2024-10-30T05:53:51Z","logger":"controller-runtime.webhook","msg":"Serving webhook server","host":"","port":9443,"stacktrace":"sigs.k8s.io/controller-runtime/pkg/webhook.(*DefaultServer).Start\n\t/metallb/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/server.go:242\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\t/metallb/vendor/sigs.k8s.io/controller-runtime/pkg/manager/runnable_group.go:223"}
2024/10/30 05:54:02 http: TLS handshake error from 10.130.0.41:48190: remote error: tls: bad certificate
2024/10/30 05:54:03 http: TLS handshake error from 10.130.0.41:48200: remote error: tls: bad certificate
2024/10/30 05:54:05 http: TLS handshake error from 10.130.0.41:48206: remote error: tls: bad certificate
2024/10/30 05:54:05 http: TLS handshake error from 10.130.0.41:48208: remote error: tls: bad certificate
2024/10/30 05:54:06 http: TLS handshake error from 10.130.0.41:48218: remote error: tls: bad certificate
2024/10/30 05:54:08 http: TLS handshake error from 10.130.0.41:58904: remote error: tls: bad certificate
2024/10/30 05:54:08 http: TLS handshake error from 10.130.0.41:58916: remote error: tls: bad certificate
2024/10/30 05:54:09 http: TLS handshake error from 10.130.0.41:58918: remote error: tls: bad certificate
2024/10/30 05:54:11 http: TLS handshake error from 10.130.0.41:58928: remote error: tls: bad certificate
2024/10/30 05:54:14 http: TLS handshake error from 10.130.0.41:58940: remote error: tls: bad certificate
2024/10/30 05:54:15 http: TLS handshake error from 10.130.0.41:58954: remote error: tls: bad certificate
2024/10/30 05:54:17 http: TLS handshake error from 10.130.0.41:58964: remote error: tls: bad certificate
2024/10/30 05:54:17 http: TLS handshake error from 10.130.0.41:58978: remote error: tls: bad certificate
2024/10/30 05:54:18 http: TLS handshake error from 10.130.0.41:44000: remote error: tls: bad certificate
2024/10/30 05:54:20 http: TLS handshake error from 10.130.0.41:44014: remote error: tls: bad certificate
2024/10/30 05:54:23 http: TLS handshake error from 10.130.0.41:44024: remote error: tls: bad certificate
2024/10/30 05:54:24 http: TLS handshake error from 10.130.0.41:44038: remote error: tls: bad certificate
2024/10/30 05:54:26 http: TLS handshake error from 10.130.0.41:44052: remote error: tls: bad certificate
DanielOsypenko commented 3 weeks ago

also no available monitor and webhook services in metallb ns, similar to bellow (taken from BM2):

frr-k8s-monitor-service                       ClusterIP   None             <none>        9140/TCP,9141/TCP   23h
frr-k8s-webhook-service                       ClusterIP   172.30.21.2      <none>        443/TCP             23h

available only:

oc get service
NAME                                          TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
controller-monitor-service                    ClusterIP   None             <none>        9120/TCP            6d21h
metallb-operator-controller-manager-service   ClusterIP   172.30.223.229   <none>        443/TCP             6d21h
metallb-operator-webhook-server-service       ClusterIP   172.30.64.33     <none>        443/TCP             6d21h
metallb-operator-webhook-service              ClusterIP   172.30.27.207    <none>        443/TCP             6d21h
speaker-monitor-service                       ClusterIP   None             <none>        9120/TCP,9121/TCP   6d21h
webhook-service                               ClusterIP   172.30.179.29    <none>        443/TCP             6d21h
DanielOsypenko commented 3 weeks ago

Issue is related or matches with Metallb issue https://github.com/metallb/metallb-operator/issues/494

fedepaol commented 3 weeks ago

is this upstream or the openshift version?

DanielOsypenko commented 3 weeks ago

@fedepaol full version of metallb operator is metallb-operator.v4.16.0-202410292005, tbh I was thinking that it is a community-only operator:

oc get csv
NAME                                         DISPLAY                          VERSION               REPLACES                                     PHASE
ingress-node-firewall.v4.16.0-202409051837   Ingress Node Firewall Operator   4.16.0-202409051837   ingress-node-firewall.v4.16.0-202410011135   Succeeded
metallb-operator.v4.16.0-202410292005        MetalLB Operator                 4.16.0-202410292005   metallb-operator.v4.16.0-202410251707        Succeeded 
fedepaol commented 3 weeks ago

That's RH's one. The version is matching the cluster

DanielOsypenko commented 3 weeks ago

Thanks @fedepaol, looking answers on RH slack channel #forum-ocp-metallb