lsst / qserv-operator

Qserv Operator creates/configures/manages Qserv clusters atop Kubernetes
http://dm.lsst.org/
6 stars 7 forks source link

qserv-operator-webhook-service - connect: no route to host #57

Closed GregBlow closed 1 year ago

GregBlow commented 1 year ago

Moving from an early 2022 version to the most recent release I encountered some difficulty around the cert-manager-webhook pod. This presented as

Error from server (InternalError): error when creating "test-resources.yaml": Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": dial tcp 10.105.191.218:443: connect: no route to host

This was resolved by installing cert-manager following the comment here:

https://github.com/cert-manager/cert-manager/issues/3265#issuecomment-814575473

Specifically it seems necessary to change the secure port with something like:

webhook.securePort=10251

(10250 is default)

I believe this is because of a conflict with the CNI.

Deployment of qserv-operator-system proceeded correctly after this.

However, an almost identical error was encountered when attempting to deploy qserv, against the qserv-operator-webook-service service. Seems probable that the secure port is similarly contested.

Error from server (InternalError): error when creating "manifests/somerville-lsst-qserv/": Internal error occurred: failed calling webhook "mqserv.kb.io": failed to call webhook: Post "https://qserv-operator-webhook-service.qserv-operator-system.svc:443/mutate-qserv-lsst-org-v1beta1-qserv?timeout=10s": dial tcp 10.96.2.117:443: connect: no route to host
GregBlow commented 1 year ago
---
apiVersion: v1
kind: Service
metadata:
  name: qserv-operator-webhook-service
  namespace: qserv-operator-system
spec:
  ports:
  - port: 443
    protocol: TCP
    targetPort: 9443
  selector:
    control-plane: controller-manager
---
apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
metadata:
  annotations:
    cert-manager.io/inject-ca-from: qserv-operator-system/qserv-operator-serving-cert
  name: qserv-operator-mutating-webhook-configuration
webhooks:
- admissionReviewVersions:
  - v1
  clientConfig:
    service:
      name: qserv-operator-webhook-service
      namespace: qserv-operator-system
      path: /mutate-qserv-lsst-org-v1beta1-qserv
  failurePolicy: Fail
  name: mqserv.kb.io
  rules:
  - apiGroups:
    - qserv.lsst.org
    apiVersions:
    - v1beta1
    operations:
    - CREATE
    - UPDATE
    resources:
    - qservs
  sideEffects: None
---
GregBlow commented 1 year ago
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: qserv-operator-serving-cert
  namespace: qserv-operator-system
spec:
  dnsNames:
  - qserv-operator-webhook-service.qserv-operator-system.svc
  - qserv-operator-webhook-service.qserv-operator-system.svc.cluster.local
  issuerRef:
    kind: Issuer
    name: qserv-operator-selfsigned-issuer
  secretName: webhook-server-cert
---

working test case:

---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: selfsigned-cert
  namespace: cert-manager-test
spec:
  dnsNames:
    - example.com
  secretName: selfsigned-cert-tls
  issuerRef:
    name: test-selfsigned
GregBlow commented 1 year ago
apiVersion: v1
kind: Namespace
metadata:
  name: cert-manager-test
---
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: test-selfsigned
  namespace: cert-manager-test
spec:
  selfSigned: {}
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: qserv-operator-serving-cert
  namespace: qserv-operator-system
spec:
  dnsNames:
  - qserv-operator-webhook-service.qserv-operator-system.svc
  - qserv-operator-webhook-service.qserv-operator-system.svc.cluster.local
  secretName: webhook-server-cert
  issuerRef:
    name: qserv-operator-selfsigned-issuer
    kind: Issuer

works

GregBlow commented 1 year ago
ubuntu@sv-qserv-jump:~/qserv-operator$ kubectl apply -k manifests/somerville-lsst-qserv/
secret/secret-ingest-db-qserv unchanged
secret/secret-mariadb-qserv configured
secret/secret-repl-creds-qserv unchanged
secret/secret-repl-db-qserv unchanged
Error from server (InternalError): error when creating "manifests/somerville-lsst-qserv/": Internal error occurred: failed calling webhook "mqserv.kb.io": failed to call webhook: Post "https://qserv-operator-webhook-service.qserv-operator-system.svc:443/mutate-qserv-lsst-org-v1beta1-qserv?timeout=10s": dial tcp 10.96.2.117:443: connect: no route to host

ubuntu@sv-qserv-jump:~/qserv-operator$ kubectl get svc -A
NAMESPACE               NAME                                                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                  AGE
cert-manager            cert-manager                                        ClusterIP   10.106.168.23    <none>        9402/TCP                 18h
cert-manager            cert-manager-webhook                                ClusterIP   10.97.60.67      <none>        443/TCP                  18h
default                 kubernetes                                          ClusterIP   10.96.0.1        <none>        443/TCP                  19h
kube-system             kube-dns                                            ClusterIP   10.96.0.10       <none>        53/UDP,53/TCP,9153/TCP   19h
qserv-operator-system   qserv-operator-controller-manager-metrics-service   ClusterIP   10.108.121.131   <none>        8443/TCP                 17h
qserv-operator-system   qserv-operator-webhook-service                      ClusterIP   10.96.2.117      <none>        443/TCP                  17h
GregBlow commented 1 year ago

qserv-operator-mutating-webhook-configuration sets up correctly in test deployment

ubuntu@sv-qserv-jump:~$ kubectl describe mutatingwebhookconfiguration.admissionregistration.k8s.io/qserv-operator-mutating-webhook-configuration
Name:         qserv-operator-mutating-webhook-configuration
Namespace:
Labels:       <none>
Annotations:  cert-manager.io/inject-ca-from: qserv-operator-system/qserv-operator-serving-cert
API Version:  admissionregistration.k8s.io/v1
Kind:         MutatingWebhookConfiguration
Metadata:
  Creation Timestamp:  2023-03-14T17:42:47Z
  Generation:          3
  Managed Fields:
    API Version:  admissionregistration.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:cert-manager.io/inject-ca-from:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:webhooks:
        .:
        k:{"name":"mqserv.kb.io"}:
          .:
          f:admissionReviewVersions:
          f:clientConfig:
            .:
            f:service:
              .:
              f:name:
              f:namespace:
              f:path:
              f:port:
          f:failurePolicy:
          f:matchPolicy:
          f:name:
          f:namespaceSelector:
          f:objectSelector:
          f:reinvocationPolicy:
          f:rules:
          f:sideEffects:
          f:timeoutSeconds:
    Manager:      kubectl-client-side-apply
    Operation:    Update
    Time:         2023-03-14T17:42:47Z
    API Version:  admissionregistration.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:webhooks:
        k:{"name":"mqserv.kb.io"}:
          f:clientConfig:
            f:caBundle:
    Manager:         cainjector
    Operation:       Update
    Time:            2023-03-15T10:59:31Z
  Resource Version:  184796
  UID:               3d72f13c-f565-4d80-a444-73275e598a69
Webhooks:
  Admission Review Versions:
    v1
  Client Config:
    Ca Bundle:  LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURQekNDQWllZ0F3SUJBZ0lRTFE5Qzg5ZTdqYmJlTlhCVDZ6MkFKREFOQmdrcWhraUc5dzBCQVFzRkFEQUEKTUI0WERUSXpNRE14TlRFd05Ua3pNVm9YRFRJek1EWXhNekV3TlRrek1Wb3dBRENDQVNJd0RRWUpLb1pJaHZjTgpBUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBT2FhYnJKeVJZVUR5SHBFWmJ4NmhqdSs0N21RWU9MWWRjVnRlNWdtCjJ4STZ1bnplb0N5bExUVUk4VlQzdjdhaExyejBYNk1TK0Q1bnhXdFNUdEhkdUtmVURVRHlUSWhmNWNldWo4NVAKUjMxZTNaT2Q0Y3p2M2xnRE16ZXRuYUNUNy9TVEc3TjVyOVNwZENPWEFaZytpdi8rRFN5M25QVGFCUlA4Syt4aAozck9SNFVrK2ovKzU5bDlhV2EvM0FuNDZxazFlR3AvSDF3MzNPU2RCZmpjV0pKNW5BUm94K1o5ZHpnMjJlNDUrClh4K1ZlRnlMeG4rTlVVR0I3bk1BZGRNekNtS3dkYjRseE1BOXFJSS8vZ3lsY0RXUEN1alpBRDhWZWQ3dTA1QlgKV2ZmeGh4VTB3WmZKUWRlWElwdUhWd09aQytSMWlVdXZZeFVSSm9XbWVSc3V1cUVDQXdFQUFhT0J0RENCc1RBTwpCZ05WSFE4QkFmOEVCQU1DQmFBd0RBWURWUjBUQVFIL0JBSXdBRENCa0FZRFZSMFJBUUgvQklHRk1JR0Nnamh4CmMyVnlkaTF2Y0dWeVlYUnZjaTEzWldKb2IyOXJMWE5sY25acFkyVXVjWE5sY25ZdGIzQmxjbUYwYjNJdGMzbHoKZEdWdExuTjJZNEpHY1hObGNuWXRiM0JsY21GMGIzSXRkMlZpYUc5dmF5MXpaWEoyYVdObExuRnpaWEoyTFc5dwpaWEpoZEc5eUxYTjVjM1JsYlM1emRtTXVZMngxYzNSbGNpNXNiMk5oYkRBTkJna3Foa2lHOXcwQkFRc0ZBQU9DCkFRRUF5R0NDaWt5L2hVSlVETzk5TEQ2UFdmcjFjQmFVN1JPRkc0b2NsMXY5QmNCS0taMVR6cWNmNDd4RUxVancKNlpOYk4zbCs0Z0VkOXpvWC9QbWQ1c1dlMWg0Vmt4V2xvd2FyQ3pBRnVLOTRpN1QzenlhbjRnck1wbm9xaCsxbApURnRjODVPK2h2Yk5RNXh4UHg1c3NyTERKcmIzaFdYdFF4SW5RRExmc0F1TjBvL0lDd3VLSGtiMDMvK1N2a0NNCkRvYXVaN1RYT0JpdzU1RVR5MWZGOGhkRXh3YUlqaU4yTWVNSWNhRlJJOW1GWDBIMkczODNLWms1d3h4aW8vd2gKYVE1SStDMzN5ZEw1eVR1WDhZR3NXWEJ5MThYb0xHM2VuNzZldU50bm9YNjRQakxlSE0yMG10SWlJK29Lb1ZBTAo0ZytLRGlQM09rdk5Bd1QveFBhYjJXVFFkdz09Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
    Service:
      Name:        qserv-operator-webhook-service
      Namespace:   qserv-operator-system
      Path:        /mutate-qserv-lsst-org-v1beta1-qserv
      Port:        443
  Failure Policy:  Fail
  Match Policy:    Equivalent
  Name:            mqserv.kb.io
  Namespace Selector:
  Object Selector:
  Reinvocation Policy:  Never
  Rules:
    API Groups:
      qserv.lsst.org
    API Versions:
      v1beta1
    Operations:
      CREATE
      UPDATE
    Resources:
      qservs
    Scope:          *
  Side Effects:     None
  Timeout Seconds:  10
Events:             <none>
GregBlow commented 1 year ago
ubuntu@sv-qserv-jump:~$ kubectl apply -f https://raw.githubusercontent.com/lsst/qserv-operator/$RELEASE/manifests/operator.yaml
namespace/qserv-operator-system created
customresourcedefinition.apiextensions.k8s.io/qservs.qserv.lsst.org created
serviceaccount/qserv-operator-controller-manager created
role.rbac.authorization.k8s.io/qserv-operator-leader-election-role created
clusterrole.rbac.authorization.k8s.io/qserv-operator-manager-role created
clusterrole.rbac.authorization.k8s.io/qserv-operator-metrics-reader created
clusterrole.rbac.authorization.k8s.io/qserv-operator-proxy-role created
rolebinding.rbac.authorization.k8s.io/qserv-operator-leader-election-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/qserv-operator-manager-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/qserv-operator-proxy-rolebinding created
configmap/qserv-operator-manager-config created
service/qserv-operator-controller-manager-metrics-service created
service/qserv-operator-webhook-service created
deployment.apps/qserv-operator-controller-manager created
certificate.cert-manager.io/qserv-operator-serving-cert created
issuer.cert-manager.io/qserv-operator-selfsigned-issuer created
mutatingwebhookconfiguration.admissionregistration.k8s.io/qserv-operator-mutating-webhook-configuration created
validatingwebhookconfiguration.admissionregistration.k8s.io/qserv-operator-validating-webhook-configuration created
GregBlow commented 1 year ago
ubuntu@sv-qserv-jump:~/qserv-operator$ kubectl describe svc -A
Name:              cert-manager
Namespace:         cert-manager
Labels:            app=cert-manager
                   app.kubernetes.io/component=controller
                   app.kubernetes.io/instance=cert-manager
                   app.kubernetes.io/managed-by=Helm
                   app.kubernetes.io/name=cert-manager
                   app.kubernetes.io/version=v1.11.0
                   helm.sh/chart=cert-manager-v1.11.0
Annotations:       meta.helm.sh/release-name: cert-manager
                   meta.helm.sh/release-namespace: cert-manager
Selector:          app.kubernetes.io/component=controller,app.kubernetes.io/instance=cert-manager,app.kubernetes.io/name=cert-manager
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.106.168.23
IPs:               10.106.168.23
Port:              tcp-prometheus-servicemonitor  9402/TCP
TargetPort:        9402/TCP
Endpoints:         10.40.0.1:9402
Session Affinity:  None
Events:            <none>

Name:              cert-manager-webhook
Namespace:         cert-manager
Labels:            app=webhook
                   app.kubernetes.io/component=webhook
                   app.kubernetes.io/instance=cert-manager
                   app.kubernetes.io/managed-by=Helm
                   app.kubernetes.io/name=webhook
                   app.kubernetes.io/version=v1.11.0
                   helm.sh/chart=cert-manager-v1.11.0
Annotations:       meta.helm.sh/release-name: cert-manager
                   meta.helm.sh/release-namespace: cert-manager
Selector:          app.kubernetes.io/component=webhook,app.kubernetes.io/instance=cert-manager,app.kubernetes.io/name=webhook
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.97.60.67
IPs:               10.97.60.67
Port:              https  443/TCP
TargetPort:        https/TCP
Endpoints:         10.71.0.202:10251
Session Affinity:  None
Events:            <none>

Name:              kubernetes
Namespace:         default
Labels:            component=apiserver
                   provider=kubernetes
Annotations:       <none>
Selector:          <none>
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.96.0.1
IPs:               10.96.0.1
Port:              https  443/TCP
TargetPort:        6443/TCP
Endpoints:         10.71.0.129:6443
Session Affinity:  None
Events:            <none>

Name:              kube-dns
Namespace:         kube-system
Labels:            k8s-app=kube-dns
                   kubernetes.io/cluster-service=true
                   kubernetes.io/name=CoreDNS
Annotations:       prometheus.io/port: 9153
                   prometheus.io/scrape: true
Selector:          k8s-app=kube-dns
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.96.0.10
IPs:               10.96.0.10
Port:              dns  53/UDP
TargetPort:        53/UDP
Endpoints:         10.32.0.2:53,10.32.0.3:53
Port:              dns-tcp  53/TCP
TargetPort:        53/TCP
Endpoints:         10.32.0.2:53,10.32.0.3:53
Port:              metrics  9153/TCP
TargetPort:        9153/TCP
Endpoints:         10.32.0.2:9153,10.32.0.3:9153
Session Affinity:  None
Events:            <none>

Name:              qserv-operator-controller-manager-metrics-service
Namespace:         qserv-operator-system
Labels:            control-plane=controller-manager
Annotations:       <none>
Selector:          control-plane=controller-manager
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.96.52.203
IPs:               10.96.52.203
Port:              https  8443/TCP
TargetPort:        https/TCP
Endpoints:         10.45.0.1:8443
Session Affinity:  None
Events:            <none>

Name:              qserv-operator-webhook-service
Namespace:         qserv-operator-system
Labels:            <none>
Annotations:       <none>
Selector:          control-plane=controller-manager
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.109.245.119
IPs:               10.109.245.119
Port:              <unset>  443/TCP
TargetPort:        9443/TCP
Endpoints:         10.45.0.1:9443
Session Affinity:  None
Events:            <none>
GregBlow commented 1 year ago
ubuntu@sv-qserv-jump:~/qserv-operator$ kubectl logs --namespace qserv-operator-system qserv-operator-controller-manager-5946797c8-g77j8
1.6788789642519722e+09  INFO    controller-runtime.metrics      Metrics server is starting to listen    {"addr": "127.0.0.1:8080"}
1.6788789642523198e+09  INFO    Registering a mutating webhook  {"controller": "qserv", "controllerGroup": "qserv.lsst.org", "controllerKind": "Qserv", "GVK": "qserv.lsst.org/v1beta1, Kind=Qserv", "path": "/mutate-qserv-lsst-org-v1beta1-qserv"}
1.6788789642524202e+09  INFO    controller-runtime.webhook      Registering webhook     {"path": "/mutate-qserv-lsst-org-v1beta1-qserv"}
1.6788789642524896e+09  INFO    Registering a validating webhook        {"controller": "qserv", "controllerGroup": "qserv.lsst.org", "controllerKind": "Qserv", "GVK": "qserv.lsst.org/v1beta1, Kind=Qserv", "path": "/validate-qserv-lsst-org-v1beta1-qserv"}
1.6788789642525532e+09  INFO    controller-runtime.webhook      Registering webhook     {"path": "/validate-qserv-lsst-org-v1beta1-qserv"}
1.6788789642526422e+09  INFO    setup   starting manager
1.6788789642530053e+09  INFO    controller-runtime.webhook.webhooks     Starting webhook server
1.678878964253087e+09   INFO    Starting server {"path": "/metrics", "kind": "metrics", "addr": "127.0.0.1:8080"}
1.678878964253133e+09   INFO    Starting server {"kind": "health probe", "addr": "[::]:8081"}
1.678878964253346e+09   INFO    controller-runtime.certwatcher  Updated current TLS certificate
I0315 11:16:04.253355       1 leaderelection.go:248] attempting to acquire leader lease qserv-operator-system/b867d9a3.lsst.org...
1.6788789642534573e+09  INFO    controller-runtime.webhook      Serving webhook server  {"host": "", "port": 9443}
1.6788789642535846e+09  INFO    controller-runtime.certwatcher  Starting certificate watcher
I0315 11:16:04.264196       1 leaderelection.go:258] successfully acquired lease qserv-operator-system/b867d9a3.lsst.org
1.6788789642646234e+09  INFO    Starting EventSource    {"controller": "qserv", "controllerGroup": "qserv.lsst.org", "controllerKind": "Qserv", "source": "kind source: *v1beta1.Qserv"}
1.6788789642642815e+09  DEBUG   events  Normal  {"object": {"kind":"Lease","namespace":"qserv-operator-system","name":"b867d9a3.lsst.org","uid":"8c1b50cc-cd5d-4fd0-9ddd-3b5548f38b16","apiVersion":"coordination.k8s.io/v1","resourceVersion":"187697"}, "reason": "LeaderElection", "message": "qserv-operator-controller-manager-5946797c8-g77j8_3c9d276b-1a47-4f2e-80e2-bdc7eedd6212 became leader"}
1.678878964264724e+09   INFO    Starting EventSource    {"controller": "qserv", "controllerGroup": "qserv.lsst.org", "controllerKind": "Qserv", "source": "kind source: *v1.ConfigMap"}
1.6788789642647657e+09  INFO    Starting EventSource    {"controller": "qserv", "controllerGroup": "qserv.lsst.org", "controllerKind": "Qserv", "source": "kind source: *v1.Service"}
1.6788789642647913e+09  INFO    Starting EventSource    {"controller": "qserv", "controllerGroup": "qserv.lsst.org", "controllerKind": "Qserv", "source": "kind source: *v1.Deployment"}
1.6788789642648034e+09  INFO    Starting EventSource    {"controller": "qserv", "controllerGroup": "qserv.lsst.org", "controllerKind": "Qserv", "source": "kind source: *v1.StatefulSet"}
1.6788789642648206e+09  INFO    Starting Controller     {"controller": "qserv", "controllerGroup": "qserv.lsst.org", "controllerKind": "Qserv"}
1.6788789643662183e+09  INFO    Starting workers        {"controller": "qserv", "controllerGroup": "qserv.lsst.org", "controllerKind": "Qserv", "worker count": 1}
GregBlow commented 1 year ago

attempting to change the webhook server port to 10252 from 9443 had no effect. ... logs still have: 1.6788827344799843e+09 INFO controller-runtime.webhook Serving webhook server {"host": "", "port": 9443}

GregBlow commented 1 year ago

Resembles https://stackoverflow.com/questions/74783557/metallb-kubernetes-installation-failed-calling-webhook-ipaddresspoolvalidation

GregBlow commented 1 year ago

https://kubernetes.io/docs/tasks/administer-cluster/declare-network-policy/

GregBlow commented 1 year ago

Next step will likely be to try replacing the CNI, as weave is known to produce similar problems with other services and it may be simpler to replace than reconfigure.

GregBlow commented 1 year ago

https://www.weave.works/docs/net/latest/kubernetes/kube-addon/#blocked-connections

GregBlow commented 1 year ago

Coming back to the problem after ~1 week: After deleting and reapplying the weave deployment, qserv appears to have deployed successfully. Closing as may just be a issue of performing steps in a different order.