google / knative-gcp

GCP event implementations to use with Knative Eventing.
https://github.com/knative/eventing
Apache License 2.0
159 stars 74 forks source link

Cannot complete CloudPubSubSource example, no subscription created in GCP #1953

Closed mattwelke closed 3 years ago

mattwelke commented 3 years ago

Describe the bug I tried to run the example here: https://github.com/google/knative-gcp/tree/master/docs/examples/cloudpubsubsource.

I followed each step.

For the prerequisite step "Install Knative" (https://github.com/google/knative-gcp/blob/master/docs/install/install-knative-gcp.md), where it tells you to choose a version (for KGCP_VERSION), I chose v0.19.1 because it was the latest release. For the part in this step "Configure the Authentication Mechanism for GCP (the Control Plane)", I chose "Option 1 (Recommended): Use Workload Identity.", so I ran ./hack/init_control_plane_gke.sh [CLUSTER_NAME] [CLUSTER_LOCATION] [CLUSTER_LOCATION_TYPE] [PROJECT_ID]. I filled in each of these parameters. My cluster was zonal.

For the prerequisite step "Create a Service Account for the Data Plane", I followed each of its steps and chose "Option 1: Use Workload Identity" when it got to the authorization part. There, I chose "Default scenario", so I ran ./hack/init_data_plane_gke.sh.

I then ran each step under "Deployment", leaving everything default. I created a topic named "testing". I did not change the file cloudpubsubsource.yaml before applying it, because as the documentation for the prerequisite steps stated, because I chose the "Default scenario" for Workload Identity, a controller would be created that would make sure objects I create in the example are authenticated using the k8s and Google service accounts automatically.

I ran the "Publish" and "Verify" steps.

Expected behavior Each k8s object applied (with kubectl apply) appears in the cluster when using kubectl get after it's applied. A Kubernetes service account named default-cre-dataplane exists after the kubectl apply steps are finished, because the documentation stated that a controller would do this for me when I deploy resources. A subscription with an automatically determined name is created in my GCP project, associated with the topic I created called "testing".

Actual behavior Each k8s object applied appeared in the cluster. I could list and describe them. No subscription was created in my GCP project. The example deployment's logs could be tailed, but no output appeared in the logs. No service account named "default-cre-dataplane" exists in the cluster in any namespace.

The state of my cloudpubsubsource object, when described, looks like this:

> kubectl describe cloudpubsubsource cloud
pubsubsource-test
Name:         cloudpubsubsource-test
Namespace:    default
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"events.cloud.google.com/v1","kind":"CloudPubSubSource","metadata":{"annotations":{},"name":"cloudpubsubsource-test","namesp...
API Version:  events.cloud.google.com/v1
Kind:         CloudPubSubSource
Metadata:
  Creation Timestamp:  2020-11-30T06:19:02Z
  Generation:          1
  Resource Version:    23752
  Self Link:           /apis/events.cloud.google.com/v1/namespaces/default/cloudpubsubsources/cloudpubsubsource-test
  UID:                 0be0dc97-1465-464e-ab03-3d709617af40
Spec:
  Sink:
    Ref:
      API Version:  v1
      Kind:         Service
      Name:         event-display
  Topic:            testing
Events:             <none>

To Reproduce Follow the latest version of the documentation's CloudPubSubSource tutorial, starting from https://github.com/google/knative-gcp/tree/master/docs/examples/cloudpubsubsource, using Workload Identity and all default options, on a newly-created k8s cluster on GKE that uses the default settings for a new GKE cluster except with Istio and Workload Identity features checked in the creation wizard.

For reference, here is the script I used to prepare my cluster each time (I copy and pasted the "get gcloud command" from the UI):

gcloud beta container --project $PROJECT_ID clusters create "classify-events" --zone $ZONE --no-enable-basic-auth --cluster-version "1.16.13-gke.401" --machine-type "e2-medium" --image-type "COS" --disk-type "pd-standard" --disk-size "100" --metadata disable-legacy-endpoints=true --scopes "https://www.googleapis.com/auth/devstorage.read_only","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/monitoring","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/trace.append" --num-nodes "3" --enable-stackdriver-kubernetes --enable-ip-alias --network "projects/$PROJECT_ID/global/networks/default" --subnetwork "projects/$PROJECT_ID/regions/$REGION/subnetworks/default" --default-max-pods-per-node "110" --no-enable-master-authorized-networks --addons HorizontalPodAutoscaling,HttpLoadBalancing,Istio --istio-config auth=MTLS_PERMISSIVE --enable-autoupgrade --enable-autorepair --max-surge-upgrade 1 --max-unavailable-upgrade 0 --workload-pool "$PROJECT_ID.svc.id.goog"

Knative-GCP release version v0.19.1

mattwelke commented 3 years ago

Should put this on hold til I try again. I realized when reviewing the steps that I had only installed Serving and knative-gcp onto my cluster, not Eventing. After I ran the Eventing install instructions (which were hidden in the docs for Knative version 0.19.0, I grabbed install commands from the 0.18.0 docs and changed the version in those commands to 0.19.0), I saw this in the logs for the controller pod:

{"level":"fatal","ts":"2020-11-30T07:16:47.367Z","logger":"controller","caller":"sharedmain/main.go:287","msg":"Version check failed","commit":"0f9a8c5","knative.dev/pod":"eventing-controller-58cdcf5c4c-hmn5c","error":"pre-release kubernetes version \"1.16.13-gke.401\" is not compatible, need at least \"1.17.0\"

It looks like the latest version of Eventing isn't compatible with the version of k8s you get by default in the GKE cluster creation wizard. By default, they give you a "static" version of 1.16 right now, but you can switch to release channels instead of a static version, and the "stable" release channel is 1.17 right now.

Therefore, I'll try this again except this time I'll use a stable release channel for my cluster version and I'll remember to install Eventing first.

mattwelke commented 3 years ago

I was able to get this working. It was indeed the two issues described above. I also found though that the default GKE cluster settings resulted in a cluster with not enough CPU to schedule the "receive-adapter" pods for CloudPubSubSource. I changed that to three e2-highcpu-4 nodes and I was able to get it working that way (though I think smaller node sizes than that may also work).

Closing issue since the example works fine with k8s 1.17, Serving v0.19, Eventing v0.19, Istio (the version in the Serving install steps), and knative-gcp v0.19.1.

grac3gao-zz commented 3 years ago

Thanks for trying out the installation and finding those issues! @mattwelke. I'll update the installation guide for some recommendations (about node number and machine type) when creating the cluster.