google / knative-gcp

GCP event implementations to use with Knative Eventing.
https://github.com/knative/eventing
Apache License 2.0
160 stars 74 forks source link

Setting Up Cloud Run Events on local cluster unable to validate webhook service. #2182

Closed maurerbot closed 3 years ago

maurerbot commented 3 years ago

Describe the bug Following the directions to install knative-gcp on my local microk8s cluster can't validate connect to the `webook.cloud-run-events.svc:443/config-validation.

Full error:

Error from server (InternalError): error when creating "https://github.com/google/knative-gcp/releases/download/v0.19.0/cloud-run-events.yaml": Internal error occurred: failed calling webhook "config.webhook.events.cloud.google.com": Post "https://webhook.cloud-run-events.svc:443/config-validation?timeout=30s": dial tcp 10.152.183.252:443: connect: connection refused

What I found in the cloud-run-events.yaml

apiVersion: admissionregistration.k8s.io/v1beta1
kind: ValidatingWebhookConfiguration
metadata:
  name: config.webhook.events.cloud.google.com
  labels:
    events.cloud.google.com/release: "v0.19.0"
webhooks:
  - admissionReviewVersions:
      - v1beta1
    clientConfig:
      service:
        name: webhook
        namespace: cloud-run-events
    failurePolicy: Fail
    sideEffects: None
    name: config.webhook.events.cloud.google.com
    namespaceSelector:
      matchExpressions:
        - key: events.cloud.google.com/release
          operator: Exists

Everything in the namespace

$ mk get all -n cloud-run-events
NAME                                                      READY   STATUS              RESTARTS   AGE
pod/storage-version-migration-knative-gcp-v0-19-0-tcrwx   0/1     Completed           0          11m
pod/controller-65bbf98864-kt6cl                           0/1     ContainerCreating   0          11m
pod/webhook-d887d49fb-cjjbb                               0/1     CrashLoopBackOff    7          11m

NAME                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/controller   ClusterIP   10.152.183.47    <none>        9090/TCP   11m
service/webhook      ClusterIP   10.152.183.252   <none>        443/TCP    11m

NAME                         READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/controller   0/1     1            0           11m
deployment.apps/webhook      0/1     1            0           11m

NAME                                    DESIRED   CURRENT   READY   AGE
replicaset.apps/controller-65bbf98864   1         1         0       11m
replicaset.apps/webhook-d887d49fb       1         1         0       11m

NAME                                                      COMPLETIONS   DURATION   AGE
job.batch/storage-version-migration-knative-gcp-v0-19-0   1/1           4s         11m

Logs from failing resources

$ mk logs -n cloud-run-events pod/storage-version-migration-knative-gcp-v0-19-0-tcrw
Error from server (NotFound): pods "storage-version-migration-knative-gcp-v0-19-0-tcrw" not found

$ mk logs -n cloud-run-events pod/controller-65bbf98864-kt6cl
Error from server (BadRequest): container "controller" in pod "controller-65bbf98864-kt6cl" is waiting to start: ContainerCreatin

$ mk describe -n cloud-run-events pod/controller-65bbf98864-kt6c
Events:
  Type     Reason       Age                   From               Message
  ----     ------       ----                  ----               -------
  Normal   Scheduled    25m                   default-scheduler  Successfully assigned cloud-run-events/controller-65bbf98864-kt6cl to desktop-ko4t9m8
  Warning  FailedMount  14m (x2 over 17m)     kubelet            Unable to attach or mount volumes: unmounted volumes=[config-logging], unattached volumes=[config-logging controller-token-8bttq google-cloud-key]: timed out waiting for the condition
  Warning  FailedMount  10m (x4 over 23m)     kubelet            Unable to attach or mount volumes: unmounted volumes=[config-logging], unattached volumes=[google-cloud-key config-logging controller-token-8bttq]: timed out waiting for the condition
  Warning  FailedMount  3m33s (x4 over 19m)   kubelet            Unable to attach or mount volumes: unmounted volumes=[config-logging], unattached volumes=[controller-token-8bttq google-cloud-key config-logging]: timed out waiting for the condition
  Warning  FailedMount  3m23s (x19 over 25m)  kubelet            MountVolume.SetUp failed for volume "config-logging" : configmap "config-logging" not found

$ mk logs -n cloud-run-events pod/webhook-d887d49fb-cjjbb
2021/03/17 13:40:28 Registering 2 clients
2021/03/17 13:40:28 Registering 3 informer factories
2021/03/17 13:40:28 Registering 4 informers
2021/03/17 13:40:28 Registering 5 controllers
{"level":"info","ts":"2021-03-17T13:40:29.107Z","caller":"logging/config.go:110","msg":"Successfully created the logger."}
{"level":"info","ts":"2021-03-17T13:40:29.107Z","caller":"logging/config.go:111","msg":"Logging level set to: info"}
{"level":"info","ts":"2021-03-17T13:40:29.107Z","logger":"webhook","caller":"profiling/server.go:59","msg":"Profiling enabled: false","commit":"58158f3"}
{"level":"info","ts":"2021-03-17T13:40:29.539Z","logger":"webhook","caller":"leaderelection/context.go:46","msg":"Running with Standard leader election","commit":"58158f3"}
{"level":"info","ts":"2021-03-17T13:40:30.157Z","logger":"webhook","caller":"sharedmain/main.go:209","msg":"Starting configuration manager...","commit":"58158f3"}
{"level":"fatal","ts":"2021-03-17T13:40:30.257Z","logger":"webhook","caller":"sharedmain/main.go:211","msg":"Failed to start configuration manager","commit":"58158f3","error":"configmap \"config-br-delivery\" not found","stacktrace":"knative.dev/pkg/injection/sharedmain.MainWithConfig\n\tknative.dev/pkg@v0.0.0-20201103163404-5514ab0c1fdf/injection/sharedmain/main.go:211\nknative.dev/pkg/injection/sharedmain.MainWithContext\n\tknative.dev/pkg@v0.0.0-20201103163404-5514ab0c1fdf/injection/sharedmain/main.go:142\nmain.main\n\tgithub.com/google/knative-gcp/cmd/webhook/main.go:273\nruntime.main\n\truntime/proc.go:203"}

They above shows the controller is having trouble mounting volumes?

I've also run the local pub/sub emulator following https://cloud.google.com/pubsub/docs/emulator

*$ gcloud beta emulators pubsub start --project=notebook-ninja
Executing: /usr/lib/google-cloud-sdk/platform/pubsub-emulator/bin/cloud-pubsub-emulator --host=localhost --port=8085
[pubsub] This is the Google Pub/Sub fake.
[pubsub] Implementation may be incomplete or differ from the real system.
[pubsub] Mar 17, 2021 9:27:17 AM com.google.cloud.pubsub.testing.v1.Main main
[pubsub] INFO: IAM integration is disabled. IAM policy methods and ACL checks are not supported
[pubsub] SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
[pubsub] SLF4J: Defaulting to no-operation (NOP) logger implementation
[pubsub] SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
[pubsub] Mar 17, 2021 9:27:17 AM com.google.cloud.pubsub.testing.v1.Main main
[pubsub] INFO: Server started, listening on 8085

Expected behavior I expect the webhook service to be emulated in my cluster?

To Reproduce Follow option 2 in https://github.com/google/knative-gcp/blob/main/docs/install/install-knative-gcp.md and apply to local microk8s cluster

Knative-GCP release version

Additional context N/A

zhongduo commented 3 years ago

This sounds like a typical problem with webhook in knative, where k8s somehow created webhook before the required configmap. Reapplying " https://github.com/google/knative-gcp/releases/download/v0.19.0/cloud-run-events.yaml" should solve the problem.

On Wed, Mar 17, 2021 at 9:41 AM Adrian Maurer @.***> wrote:

Describe the bug Following the directions to install knative-gcp on my local microk8s cluster can't validate connect to the `webook.cloud-run-events.svc:443/config-validation.

Full error:

Error from server (InternalError): error when creating "https://github.com/google/knative-gcp/releases/download/v0.19.0/cloud-run-events.yaml": Internal error occurred: failed calling webhook "config.webhook.events.cloud.google.com": Post "https://webhook.cloud-run-events.svc:443/config-validation?timeout=30s": dial tcp 10.152.183.252:443: connect: connection refused

What I found in the cloud-run-events.yaml

apiVersion: admissionregistration.k8s.io/v1beta1 kind: ValidatingWebhookConfiguration metadata: name: config.webhook.events.cloud.google.com labels: events.cloud.google.com/release: "v0.19.0" webhooks:

  • admissionReviewVersions:
    • v1beta1 clientConfig: service: name: webhook namespace: cloud-run-events failurePolicy: Fail sideEffects: None name: config.webhook.events.cloud.google.com namespaceSelector: matchExpressions:
      • key: events.cloud.google.com/release operator: Exists

Everything in the namespace

$ mk get all -n cloud-run-events NAME READY STATUS RESTARTS AGE pod/storage-version-migration-knative-gcp-v0-19-0-tcrwx 0/1 Completed 0 11m pod/controller-65bbf98864-kt6cl 0/1 ContainerCreating 0 11m pod/webhook-d887d49fb-cjjbb 0/1 CrashLoopBackOff 7 11m

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/controller ClusterIP 10.152.183.47 9090/TCP 11m service/webhook ClusterIP 10.152.183.252 443/TCP 11m

NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/controller 0/1 1 0 11m deployment.apps/webhook 0/1 1 0 11m

NAME DESIRED CURRENT READY AGE replicaset.apps/controller-65bbf98864 1 1 0 11m replicaset.apps/webhook-d887d49fb 1 1 0 11m

NAME COMPLETIONS DURATION AGE job.batch/storage-version-migration-knative-gcp-v0-19-0 1/1 4s 11m

I've also run the local pub/sub emulator following https://cloud.google.com/pubsub/docs/emulator

*$ gcloud beta emulators pubsub start --project=notebook-ninja Executing: /usr/lib/google-cloud-sdk/platform/pubsub-emulator/bin/cloud-pubsub-emulator --host=localhost --port=8085 [pubsub] This is the Google Pub/Sub fake. [pubsub] Implementation may be incomplete or differ from the real system. [pubsub] Mar 17, 2021 9:27:17 AM com.google.cloud.pubsub.testing.v1.Main main [pubsub] INFO: IAM integration is disabled. IAM policy methods and ACL checks are not supported [pubsub] SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". [pubsub] SLF4J: Defaulting to no-operation (NOP) logger implementation [pubsub] SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. [pubsub] Mar 17, 2021 9:27:17 AM com.google.cloud.pubsub.testing.v1.Main main [pubsub] INFO: Server started, listening on 8085

Expected behavior I expect the webhook service to be emulated in my cluster?

To Reproduce Follow option 2 in https://github.com/google/knative-gcp/blob/main/docs/install/install-knative-gcp.md and apply to local microk8s cluster

Knative-GCP release version

Additional context N/A

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/google/knative-gcp/issues/2182, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACE6CNBHIQZZFXTME4WPCMDTECWRVANCNFSM4ZKTVRGQ .

maurerbot commented 3 years ago

Hi @zhongduo. I've reapplied it several times and the error persists. Error from server (InternalError): error when creating "https://github.com/google/knative-gcp/releases/download/v0.19.0/cloud-run-events.yaml": Internal error occurred: failed calling webhook "config.webhook.events.cloud.google.com": Post "https://webhook.cloud-run-events.svc:443/config-validation?timeout=30s": dial tcp 10.152.183.252:443: connect: connection refused

zhongduo commented 3 years ago

Can you try to delete the webhook pod: kubectl delete -n cloud-run-events pod/webhook-d887d49fb-cjjbb.

If that still doesn't work, you will have to check the log of the webhook and controller. I notice that your controller is in creating container state, not ready either.

On Wed, Mar 17, 2021 at 9:53 AM Adrian Maurer @.***> wrote:

Hi @zhongduo https://github.com/zhongduo. I've reapplied it several times and the error persists. Error from server (InternalError): error when creating " https://github.com/google/knative-gcp/releases/download/v0.19.0/cloud-run-events.yaml": Internal error occurred: failed calling webhook " config.webhook.events.cloud.google.com": Post " https://webhook.cloud-run-events.svc:443/config-validation?timeout=30s": dial tcp 10.152.183.252:443: connect: connection refused

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/google/knative-gcp/issues/2182#issuecomment-801098719, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACE6CNECUUXAFIRLVVRDIHDTECX6HANCNFSM4ZKTVRGQ .

maurerbot commented 3 years ago

Can you try to delete the webhook pod: kubectl delete -n cloud-run-events pod/webhook-d887d49fb-cjjbb. If that still doesn't work, you will have to check the log of the webhook and controller. I notice that your controller is in creating container state, not ready either. On Wed, Mar 17, 2021 at 9:53 AM Adrian Maurer @.***> wrote: Hi @zhongduo https://github.com/zhongduo. I've reapplied it several times and the error persists. Error from server (InternalError): error when creating " https://github.com/google/knative-gcp/releases/download/v0.19.0/cloud-run-events.yaml": Internal error occurred: failed calling webhook " config.webhook.events.cloud.google.com": Post " https://webhook.cloud-run-events.svc:443/config-validation?timeout=30s": dial tcp 10.152.183.252:443: connect: connection refused — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2182 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACE6CNECUUXAFIRLVVRDIHDTECX6HANCNFSM4ZKTVRGQ .

Yes, the description of the controller is it is unable to mount a couple of volumes. Not sure what I'm missing.

maurerbot commented 3 years ago

I think I've gotten past the webhook issue. I believe microk8s host-access was not enabled and causing the issue.

However, the controller is spitting out an error:

$ mk get all -n cloud-run-events
NAME                                                      READY   STATUS      RESTARTS   AGE
pod/storage-version-migration-knative-gcp-v0-19-0-t46bv   0/1     Completed   0          5m14s
pod/webhook-d887d49fb-b94st                               1/1     Running     0          3m23s
pod/controller-65bbf98864-lcvbf                           0/1     Error       5          3m25s

NAME                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/controller   ClusterIP   10.152.183.107   <none>        9090/TCP   3m27s
service/webhook      ClusterIP   10.152.183.238   <none>        443/TCP    3m27s

NAME                         READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/webhook      1/1     1            1           3m24s
deployment.apps/controller   0/1     1            0           3m27s

NAME                                    DESIRED   CURRENT   READY   AGE
replicaset.apps/webhook-d887d49fb       1         1         1       3m24s
replicaset.apps/controller-65bbf98864   1         1         0       3m27s
$ mk describe -n cloud-run-events pod/controller-65bbf98864-ndrcn
Name:         controller-65bbf98864-ndrcn
Namespace:    cloud-run-events
Priority:     0
Node:         desktop-ko4t9m8/172.21.29.183
Start Time:   Wed, 17 Mar 2021 14:45:46 -0400
Labels:       app=cloud-run-events
              pod-template-hash=65bbf98864
              role=controller
Annotations:  cni.projectcalico.org/podIP: 10.1.97.94/32
              cni.projectcalico.org/podIPs: 10.1.97.94/32
              sidecar.istio.io/inject: false
Status:       Running
IP:           10.1.97.94
IPs:
  IP:           10.1.97.94
Controlled By:  ReplicaSet/controller-65bbf98864
Containers:
  controller:
    Container ID:  containerd://fd9601ad7b2bc1eeac98ca8487bc8b82f51bc6e18abc0c059a59e3acd3fc6f61
    Image:         gcr.io/knative-releases/github.com/google/knative-gcp/cmd/controller@sha256:f7e7f123f3d1f649de4de461286f3cb3de9de63c75f87630a98767e8a0f1cf0d
    Image ID:      gcr.io/knative-releases/github.com/google/knative-gcp/cmd/controller@sha256:f7e7f123f3d1f649de4de461286f3cb3de9de63c75f87630a98767e8a0f1cf0d
    Port:          9090/TCP
    Host Port:     0/TCP
    Args:
      --disable-ha
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Wed, 17 Mar 2021 15:02:02 -0400
      Finished:     Wed, 17 Mar 2021 15:02:02 -0400
    Ready:          False
    Restart Count:  8
    Limits:
      cpu:     1
      memory:  1000Mi
    Requests:
      cpu:     100m
      memory:  100Mi
    Environment:
      GOOGLE_APPLICATION_CREDENTIALS:  /var/secrets/google/key.json
      PUBSUB_RA_IMAGE:                 gcr.io/knative-releases/github.com/google/knative-gcp/cmd/pubsub/receive_adapter@sha256:892f0b8ec11d53639f49c7c8bbdfcf0316477f66eb61c7fc84ddaf8738032539
      PUBSUB_PUBLISHER_IMAGE:          gcr.io/knative-releases/github.com/google/knative-gcp/cmd/pubsub/publisher@sha256:de2ceffdab7bbddd570c6c1fdbd9ddf1bc071c23d0f2db6499e2256127184625
      SYSTEM_NAMESPACE:                cloud-run-events (v1:metadata.namespace)
      CONFIG_LOGGING_NAME:             config-logging
      CONFIG_OBSERVABILITY_NAME:       config-observability
      CONFIG_LEADERELECTION_NAME:      config-leader-election
      METRICS_DOMAIN:                  cloud.google.com/events
      BROKER_CELL_INGRESS_IMAGE:       gcr.io/knative-releases/github.com/google/knative-gcp/cmd/broker/ingress@sha256:0a713a4615f85f751e3c230d622de8057e63d791f6840ec76f7883c606dd7694
      BROKER_CELL_FANOUT_IMAGE:        gcr.io/knative-releases/github.com/google/knative-gcp/cmd/broker/fanout@sha256:6d6b5b4841439e3444520e6776929d9fda657283cba0a30e535813e8b4f4f7c4
      BROKER_CELL_RETRY_IMAGE:         gcr.io/knative-releases/github.com/google/knative-gcp/cmd/broker/retry@sha256:557d9e3024d57acdf8149515f74375e5a1ec0f7e18ae5f93243a8e9840d96c06
      INTERNAL_METRICS_ENABLED:        false
    Mounts:
      /etc/config-logging from config-logging (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from controller-token-4vzmw (ro)
      /var/secrets/google from google-cloud-key (rw)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  config-logging:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      config-logging
    Optional:  false
  google-cloud-key:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  google-cloud-key
    Optional:    true
  controller-token-4vzmw:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  controller-token-4vzmw
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason       Age                 From               Message
  ----     ------       ----                ----               -------
  Normal   Scheduled    16m                 default-scheduler  Successfully assigned cloud-run-events/controller-65bbf98864-ndrcn to desktop-ko4t9m8
  Warning  FailedMount  16m (x4 over 16m)   kubelet            MountVolume.SetUp failed for volume "config-logging" : configmap "config-logging" not found
  Normal   Pulled       16m                 kubelet            Successfully pulled image "gcr.io/knative-releases/github.com/google/knative-gcp/cmd/controller@sha256:f7e7f123f3d1f649de4de461286f3cb3de9de63c75f87630a98767e8a0f1cf0d" in 407.7942ms
  Normal   Pulled       16m                 kubelet            Successfully pulled image "gcr.io/knative-releases/github.com/google/knative-gcp/cmd/controller@sha256:f7e7f123f3d1f649de4de461286f3cb3de9de63c75f87630a98767e8a0f1cf0d" in 401.0451ms
  Normal   Pulled       16m                 kubelet            Successfully pulled image "gcr.io/knative-releases/github.com/google/knative-gcp/cmd/controller@sha256:f7e7f123f3d1f649de4de461286f3cb3de9de63c75f87630a98767e8a0f1cf0d" in 424.0637ms
  Normal   Pulling      15m (x4 over 16m)   kubelet            Pulling image "gcr.io/knative-releases/github.com/google/knative-gcp/cmd/controller@sha256:f7e7f123f3d1f649de4de461286f3cb3de9de63c75f87630a98767e8a0f1cf0d"
  Normal   Pulled       15m                 kubelet            Successfully pulled image "gcr.io/knative-releases/github.com/google/knative-gcp/cmd/controller@sha256:f7e7f123f3d1f649de4de461286f3cb3de9de63c75f87630a98767e8a0f1cf0d" in 430.7369ms
  Normal   Created      15m (x4 over 16m)   kubelet            Created container controller
  Normal   Started      15m (x4 over 16m)   kubelet            Started container controller
  Warning  BackOff      97s (x71 over 16m)  kubelet            Back-off restarting failed container
$ mk logs -n cloud-run-events controller-65bbf98864-lcvbf
2021/03/17 18:41:23 google: could not find default credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information.

I've set gcloud auth application-default login is there something else I'm missing?

zhongduo commented 3 years ago

I believe you didn't set up authentication: https://github.com/google/knative-gcp/blob/main/docs/install/install-knative-gcp.md#configure-the-authentication-mechanism-for-gcp-the-control-plane

github-actions[bot] commented 3 years ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.