knative / eventing-contrib

Event Sources
Apache License 2.0
224 stars 225 forks source link

Error: "Trigger had a different UID" #1520

Closed mpeyrard closed 3 years ago

mpeyrard commented 4 years ago

Describe the bug I have configured two Knative Services to communicate with each other via Knative Eventing. Our system manually creates the broker via yaml files (as opposed to installing the broker via annotation), as well as the trigger. The broker is backed by the Kafka Channel that is provided under Knative contrib. The service that produces the event does an HTTP POST to the broker address, which accepts the event with a 202 code. Using Kafka Tools, we have been able to confirm that the Cloud Event is deposited into the appropriate Kafka topic. However, the endpoint on the target service is never invoked. We have checked and re-checked that the CE-type attribute on the event matches the filter in the trigger. We have also double-checked that the endpoint on the target service is reachable by manually hitting it using postman.

Upon investigating the logs in the broker filter logs, we find the error message:

{"level":"error","ts":"2020-07-17T21:26:02.692Z","logger":"broker_filter","caller":"filter/filter_handler.go:211","msg":"Error sending the event","commit":"02bc516","error":"trigger had a different UID. From ref '17feec3a-87cb-4bd7-b63d-e04148a28963'. From Kubernetes '3e6fb035-007b-46c2-ab68-58c0993866c9'","stacktrace":"knative.dev/eventing/pkg/broker/filter.(*Handler).serveHTTP\n\tknative.dev/eventing/pkg/broker/filter/filter_handler.go:211\nreflect.Value.call\n\treflect/value.go:460\nreflect.Value.Call\n\treflect/value.go:321\nknative.dev/eventing/vendor/github.com/cloudevents/sdk-go/v1/cloudevents/client.(*receiverFn).invoke\n\tknative.dev/eventing/vendor/github.com/cloudevents/sdk-go/v1/cloudevents/client/receiver.go:93\nknative.dev/eventing/vendor/github.com/cloudevents/sdk-go/v1/cloudevents/client.(*ceClient).obsReceive\n\tknative.dev/eventing/vendor/github.com/cloudevents/sdk-go/v1/cloudevents/client/client.go:168\nknative.dev/eventing/vendor/github.com/cloudevents/sdk-go/v1/cloudevents/client.(*ceClient).Receive\n\tknative.dev/eventing/vendor/github.com/cloudevents/sdk-go/v1/cloudevents/client/client.go:157\nknative.dev/eventing/vendor/github.com/cloudevents/sdk-go/v1/cloudevents/transport/http.(*Transport).obsInvokeReceiver\n\tknative.dev/eventing/vendor/github.com/cloudevents/sdk-go/v1/cloudevents/transport/http/transport.go:530\nknative.dev/eventing/vendor/github.com/cloudevents/sdk-go/v1/cloudevents/transport/http.(*Transport).invokeReceiver\n\tknative.dev/eventing/vendor/github.com/cloudevents/sdk-go/v1/cloudevents/transport/http/transport.go:514\nknative.dev/eventing/vendor/github.com/cloudevents/sdk-go/v1/cloudevents/transport/http.(*Transport).ServeHTTP\n\tknative.dev/eventing/vendor/github.com/cloudevents/sdk-go/v1/cloudevents/transport/http/transport.go:622\nnet/http.(*ServeMux).ServeHTTP\n\tnet/http/server.go:2387\nknative.dev/eventing/vendor/go.opencensus.io/plugin/ochttp.(*Handler).ServeHTTP\n\tknative.dev/eventing/vendor/go.opencensus.io/plugin/ochttp/server.go:86\nnet/http.serverHandler.ServeHTTP\n\tnet/http/server.go:2802\nnet/http.(*conn).serve\n\tnet/http/server.go:1890"}
{"level":"info","ts":"2020-07-17T21:26:02.693Z","logger":"broker_filter","caller":"filter/filter_handler.go:237","msg":"Unable to get the Trigger","commit":"02bc516","error":"trigger had a different UID. From ref 'fd3a2ec2-a63f-4fcb-b60c-1070f98bf77c'. From Kubernetes '3e6fb035-007b-46c2-ab68-58c0993866c9'","triggerRef":"default/detector"}

I am not currently certain if this is a bug or some kind of misconfiguration on my end. However, if we assume that it is the latter, then this error message has not been very helpful in figuring out what is wrong.

Since it seems to be complaining about my Trigger specifically, I'll copy it here:

apiVersion: eventing.knative.dev/v1beta1
kind: Trigger
metadata:
  name: {{ .Chart.Name }}-trigger
  namespace: {{ .Values.global.namespace }}
  labels:
    release: {{ .Release.Name }}
    chart/name: {{ .Chart.Name }}
    chart/version: {{ .Chart.Version }}
spec:
  broker: signal-rule-broker
  filter:
    attributes:
      type: signals.seismic.rules.detection
  subscriber:
    ref:
      apiVersion: serving.knative.dev/v1alpha1
      kind: Service
      name: {{ .Chart.Name }}

Expected behavior We expected the Cloud Events to arrive at the configured destination.

To Reproduce With the current state of my project, this is what I'm doing to reproduce it:

As stated above, the cloud event finds its way into the Kafka Channel's kafka topic for the broker, but no further. Furthermore, errors appear in the brokers' filter pod logs complaining about "trigger had a different UID".

Knative release version 0.14.1

mpeyrard commented 4 years ago

After playing around with this some more, it looks like I can't delete and then re-publish another trigger with the same name. The broker filter never forgets the original, and never updates it with the new one. It's probably relevant to mention that I was managing my K service with helm. And as I was doing a lot of testing and debugging, I was doing a lot of helm delete --purge commands followed by helm install commands. Because the trigger is part of my chart, the trigger was being uninstalled by helm every time I deleted the chart.

grantr commented 4 years ago

@MPeyrard86 thanks for the report and sorry you're having trouble. Have you tried reproducing this with a more recent version than 0.14.1?

mpeyrard commented 4 years ago

Not yet. We plan on upgrading in the near future when we upgrade our version of Kubernetes.

lberk commented 4 years ago

@MPeyrard86 would you be willing to provide a bit more yaml so that we could more easily reproduce your setup? Have you tried eventing release v0.14.2?

An alternative might be to try running the conformance tests on your currently setup broker, which can be done by running in a similar manner to e2e tests: cd $GOPATH/knative.dev/eventing/test/conformance && go test -race -count=1 -tags=e2e -timeout=20m -brokernName=YOURBORKERNAMEHERE -brokerNamespace=NAMESPACHERE -run TestBrokerV1Beta1DataPlaneMetrics

mpeyrard commented 4 years ago

@lberk When I run this command, it tells me that there are no tests to run. I assume I did something wrong.

testing: warning: no tests to run
PASS
ok      knative.dev/eventing/test/conformance   1.641s

We will likely upgrade to the latest knative version once we upgrade our version of Kubernetes.

I've also done some more digging, and it seems like deleting triggers is very problematic. Even when I delete them via kubectl, not just via helm. The filter pod keeps logging errors/warnings that reference old triggers that have long since been deleted:

{"level":"warn","ts":1595973899.4984841,"logger":"fallback","caller":"http/transport.go:532","msg":"got an error from receiver fn","error":"trigger.eventing.knative.dev \"re-router-trigger-10\" not found"}
{"level":"warn","ts":1595973899.4975579,"logger":"fallback","caller":"http/transport.go:532","msg":"got an error from receiver fn","error":"trigger.eventing.knative.dev \"re-router-trigger-10\" not found"}
{"level":"warn","ts":1595973899.4987288,"logger":"fallback","caller":"http/transport.go:624","msg":"error returned from invokeReceiver","error":"trigger.eventing.knative.dev \"re-router-trigger-10\" not found"}
{"level":"warn","ts":1595973899.4988537,"logger":"fallback","caller":"http/transport.go:624","msg":"error returned from invokeReceiver","error":"trigger.eventing.knative.dev \"re-router-trigger-10\" not found"}
{"level":"warn","ts":1595973899.4979901,"logger":"fallback","caller":"http/transport.go:532","msg":"got an error from receiver fn","error":"trigger.eventing.knative.dev \"re-router-trigger-10\" not found"}
{"level":"warn","ts":1595973899.4992394,"logger":"fallback","caller":"http/transport.go:624","msg":"error returned from invokeReceiver","error":"trigger.eventing.knative.dev \"re-router-trigger-10\" not found"}
{"level":"warn","ts":1595973899.498132,"logger":"fallback","caller":"http/transport.go:532","msg":"got an error from receiver fn","error":"trigger.eventing.knative.dev \"re-router-trigger-10\" not found"}
{"level":"warn","ts":1595973899.4996395,"logger":"fallback","caller":"http/transport.go:624","msg":"error returned from invokeReceiver","error":"trigger.eventing.knative.dev \"re-router-trigger-10\" not found"}
{"level":"warn","ts":1595973899.498285,"logger":"fallback","caller":"http/transport.go:532","msg":"got an error from receiver fn","error":"trigger.eventing.knative.dev \"re-router-trigger-10\" not found"}

Rebooting the broker pods does not resolve this issue. Furthermore, physically rebooting the hardware that Kubernetes is running on also did not resolve this issue. It still thinks these triggers exist. Very confusing.

I'm going to continue investigating, but as of right now, it would appear that once I've deleted a trigger, events no longer propagate within the system, and we start seeing errors in the broker's filter logs. I currently cannot say if it's a single delete or a series of deletes (maybe something with a random chance of happening?), but we end up in this state where nothing in knative eventing works anymore, and we need to restore snapshots of our k8s images and re-install everything.

grantr commented 4 years ago

Furthermore, physically rebooting the hardware that Kubernetes is running on also did not resolve this issue.

If it persists across reboots, it must be in etcd. This suggests to me that the issue is related to some resource that's not being cleaned up properly.

Just to reduce the size of the uncertainty cone, can you tell us which broker class you're using? An easy way to tell is by checking for an annotation on the Broker like eventing.knative.dev/broker.class: MTChannelBasedBroker. What's the value of that annotation?

Something to try in the meantime:

  1. Create a trigger. A Subscription should be created. The Channel's spec should also be updated to include the new Subscription's details.
  2. Delete the trigger. The Subscription should be deleted and the Channel spec should no longer reference the deleted Subscription.

If deleting the trigger in 2 didn't correctly revert the changes you observed in 1, that's likely related to the issue.

mpeyrard commented 4 years ago

We are currently using the ChannelBasedBroker. And I can confirm that after deleting a trigger, the subscription for that trigger still exists in the channel spec. Its Generation was incremented from 1 to 2, so it did recognize that something happened to that resource.

Name:         rules-engine-kne-trigger
Namespace:    default
Labels:       eventing.knative.dev/broker=rules-engine
              eventing.knative.dev/brokerEverything=true
Annotations:  <none>
API Version:  messaging.knative.dev/v1alpha1
Kind:         KafkaChannel
Metadata:
  Creation Timestamp:  2020-07-29T02:05:29Z
  Finalizers:
    kafkachannels.messaging.knative.dev
  Generation:  7
  Owner References:
    API Version:           eventing.knative.dev/v1alpha1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  Broker
    Name:                  rules-engine
    UID:                   193741c0-0051-4210-aca6-508d0d9efaa8
  Resource Version:        19072
  Self Link:               /apis/messaging.knative.dev/v1alpha1/namespaces/default/kafkachannels/rules-engine-kne-trigger
  UID:                     7d0bcbe7-b2df-4f71-a2a4-f5859d86cec0
Spec:
  Num Partitions:      20
  Replication Factor:  1
  Subscribable:
    Subscribers:
      Generation:      2
      Reply URI:       http://rules-engine-broker.default.svc.cluster.local
      Subscriber URI:  http://rules-engine-broker-filter.default.svc.cluster.local/triggers/default/re-router-trigger-final-2/cb9245e1-e741-4d15-a9b4-f06a7505cfe9
      UID:             bfa65e22-a22a-44df-8821-dcc07140e54f
      Generation:      1
      Reply URI:       http://rules-engine-broker.default.svc.cluster.local
      Subscriber URI:  http://rules-engine-broker-filter.default.svc.cluster.local/triggers/default/re-router-trigger-final/92848617-d438-4698-8d8b-a9e00e16327f
      UID:             aae44d10-cd54-4a9e-9c6c-966c5e529405
      Generation:      1
      Reply URI:       http://rules-engine-broker.default.svc.cluster.local
      Subscriber URI:  http://rules-engine-broker-filter.default.svc.cluster.local/triggers/default/re-email-trigger-final-2/b36857d6-a69d-4cb5-a213-505836f1ca31
      UID:             f33d6781-bc93-4009-9ca6-d34f44afe011
      Generation:      1
      Reply URI:       http://rules-engine-broker.default.svc.cluster.local
      Subscriber URI:  http://rules-engine-broker-filter.default.svc.cluster.local/triggers/default/re-email-trigger-final/0d879cb1-60af-4d31-a623-1172b65e5822
      UID:             832d2ad5-977f-44bb-94f2-cb2d08aa2da0
grantr commented 4 years ago

I assume the deleted Trigger is the one corresponding to the Subscription with UID bfa65e22-a22a-44df-8821-dcc07140e54f. Does the Subscription object with that UID still exist or was it deleted along with the Trigger?

Here's a 3rd debugging step: Create the trigger again with the same name. What changes in the Channel spec?

mpeyrard commented 4 years ago

I confirmed that the Subscription resources are deleted when the triggers are deleted. However, the KafkaChannel does not look like it's in a good state after deleting, and especially after re-adding the triggers:

$ kubectl describe kafkachannel.messaging.knative.dev/rules-engine-kne-trigger
Name:         rules-engine-kne-trigger
Namespace:    default
Labels:       eventing.knative.dev/broker=rules-engine
              eventing.knative.dev/brokerEverything=true
Annotations:  <none>
API Version:  messaging.knative.dev/v1alpha1
Kind:         KafkaChannel
Metadata:
  Creation Timestamp:  2020-07-29T04:16:17Z
  Finalizers:
    kafkachannels.messaging.knative.dev
  Generation:  8
  Owner References:
    API Version:           eventing.knative.dev/v1alpha1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  Broker
    Name:                  rules-engine
    UID:                   28065a28-61a4-44b6-8a51-c3c4b2a81cc9
  Resource Version:        329567
  Self Link:               /apis/messaging.knative.dev/v1alpha1/namespaces/default/kafkachannels/rules-engine-kne-trigger
  UID:                     79d4914c-0b75-46a0-80a9-701ef754ee25
Spec:
  Num Partitions:      20
  Replication Factor:  1
  Subscribable:
    Subscribers:
      Generation:      2
      Reply URI:       http://rules-engine-broker.default.svc.cluster.local
      Subscriber URI:  http://rules-engine-broker-filter.default.svc.cluster.local/triggers/default/re-email-trigger-final/6061efb7-8a04-450e-b325-0f41030dcece
      UID:             3b694fd7-1d39-4bf6-88b2-8914ec98e86b
      Generation:      2
      Reply URI:       http://rules-engine-broker.default.svc.cluster.local
      Subscriber URI:  http://rules-engine-broker-filter.default.svc.cluster.local/triggers/default/re-router-trigger-final/7f6d8423-70bd-4ce0-a6b0-5afd25fb9eed
      UID:             9843c958-ab5b-4794-b3ee-0c19cb7f4315
      Generation:      1
      Reply URI:       http://rules-engine-broker.default.svc.cluster.local
      Subscriber URI:  http://rules-engine-broker-filter.default.svc.cluster.local/triggers/default/re-email-trigger-final/ac9b6f31-c215-44cf-91ea-9f735866b2dc
      UID:             e29816b3-1567-4758-8ab3-6b207e1063c0
      Generation:      1
      Reply URI:       http://rules-engine-broker.default.svc.cluster.local
      Subscriber URI:  http://rules-engine-broker-filter.default.svc.cluster.local/triggers/default/re-router-trigger-final/0466ea4b-02ef-4a7f-b4d7-a9d57c77df29
      UID:             43526bea-5205-4d22-b0dc-08d110746240
Status:
  Address:
    Hostname:  rules-engine-kne-trigger-kn-channel.default.svc.cluster.local
    URL:       http://rules-engine-kne-trigger-kn-channel.default.svc.cluster.local
  Conditions:
    Last Transition Time:  2020-07-29T04:21:58Z
    Status:                True
    Type:                  Addressable
    Last Transition Time:  2020-07-29T04:21:58Z
    Status:                True
    Type:                  ChannelServiceReady
    Last Transition Time:  2020-07-29T04:21:53Z
    Status:                True
    Type:                  ConfigurationReady
    Last Transition Time:  2020-07-29T12:52:33Z
    Status:                True
    Type:                  DispatcherReady
    Last Transition Time:  2020-07-29T04:22:18Z
    Status:                True
    Type:                  EndpointsReady
    Last Transition Time:  2020-07-29T12:52:33Z
    Status:                True
    Type:                  Ready
    Last Transition Time:  2020-07-29T04:21:53Z
    Status:                True
    Type:                  ServiceReady
    Last Transition Time:  2020-07-29T04:21:53Z
    Status:                True
    Type:                  TopicReady
  Subscribable Status:
    Subscribers:
      Observed Generation:  2
      Ready:                True
      UID:                  3b694fd7-1d39-4bf6-88b2-8914ec98e86b
      Observed Generation:  2
      Ready:                True
      UID:                  9843c958-ab5b-4794-b3ee-0c19cb7f4315
      Observed Generation:  1
      Ready:                True
      UID:                  e29816b3-1567-4758-8ab3-6b207e1063c0
      Observed Generation:  1
      Ready:                True
      UID:                  43526bea-5205-4d22-b0dc-08d110746240
Events:
  Type    Reason                  Age                From                     Message
  ----    ------                  ----               ----                     -------
  Normal  KafkaChannelReconciled  3s (x34 over 32h)  kafkachannel-controller  KafkaChannel reconciled: "default/rules-engine-kne-trigger"
  Normal  ChannelReconciled       3s (x12 over 24h)  kafka-ch-dispatcher      KafkaChannel reconciled

To summarize: It shows four subscribers in the channel spec, when in fact there are only two. Those that are on Generation: 2 were deleted. Those that are at Generation: 1 were the same triggers that were re-added after they were deleted.

mpeyrard commented 4 years ago

@grantr So it sounds like a KafkaChannel specific error, then? Should I log another bug under the contrib project?

lberk commented 4 years ago

@MPeyrard86 yes please!

grantr commented 4 years ago

Seems like a KafkaChannel-specific error, or possibly a Subscription controller error. The Broker seems to be operating correctly but something in the KafkaChannel controller or the Subscription controller is not working properly.

grantr commented 4 years ago

@slinkydeveloper Can you move this issue to eventing-contrib (or whatever repo Kafka Channel lives in these days)?

@vaikas you might try reproing this with IMC to check if it's a subscription controller issue.

vaikas commented 4 years ago

Ok, so I've been playing with this from the head with the following set up:

Broker:

kubectl create -f - <<EOF
apiVersion: eventing.knative.dev/v1
kind: Broker
metadata:
  name: broker3
  namespace: vaikas-test
EOF

PingSource:

kubectl create -f - <<EOF
apiVersion: sources.knative.dev/v1beta1
kind: PingSource
metadata:
  name: test-ping-source
  namespace: vaikas-test
spec:
  schedule: "*/1 * * * *"
  jsonData: '{"message": "Hello world!"}'
  sink:
    ref:
      apiVersion: eventing.knative.dev/v1
      kind: Broker
      name: broker3
EOF

And two functions:

kubectl create -f - <<EOF
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: event-display
  namespace: vaikas-test
spec:
  template:
    spec:
      containers:
      - image: gcr.io/knative-releases/knative.dev/eventing-contrib/cmd/event_display@sha256:a214514d6ba674d7393ec8448dd272472b2956207acb3f83152d3071f0ab1911
EOF

and:

kubectl create -f - <<EOF
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: event-display-2
  namespace: vaikas-test
spec:
  template:
    spec:
      containers:
      - image: gcr.io/knative-releases/knative.dev/eventing-contrib/cmd/event_display@sha256:a214514d6ba674d7393ec8448dd272472b2956207acb3f83152d3071f0ab1911
EOF

And by creating an initial trigger like this:

kubectl create -f - <<EOF
apiVersion: eventing.knative.dev/v1
kind: Trigger
metadata:
  name: trigger-2
  namespace: vaikas-test
spec:
  broker: broker3
  subscriber:
    ref:
      apiVersion: serving.knative.dev/v1
      kind: Service
      name: event-display
EOF

Then I've done things like:

So, the TL;DR I can't seem to be able to repro this (or I'm trying to repro it incorrectly). Of course this is from the head, so maybe something was fixed along the way and it was broken in the version you are running. If it's not a huge pain, @MPeyrard86 could you try running with IMC instead of Kafka channel or alternatively, @slinkydeveloper could you try to repro with Kafka channels from the head to see if it's a bug?

mpeyrard commented 4 years ago

Thanks for looking into it! Yeah, I'll give it a try with IMC. And yes, I think it's probably a KafkaChannel issue. We've been seeing some stability issues with the Kafka Channel. Since logging the bug, we've also seen this issue coincide with the kafka-ch-dispatcher going into a crash loop. But not always. I'll get back to you when I've done some tests with the IMC.

vaikas commented 4 years ago

superduper, thanks much and sorry for the troubles :(

vaikas commented 4 years ago

Just checking if you have had any luck trying to repro this. @slinkydeveloper any luck on kafka? Is there an issue we could link from here?

mpeyrard commented 4 years ago

Hi @vaikas OK I was able to reproduce this using the IMC, as well.

After adding and deleting a trigger using the in-memory channel, I see this:

$ kubectl describe imc default
Name:         default-kne-trigger
Namespace:    default
Labels:       eventing.knative.dev/broker=default
              eventing.knative.dev/brokerEverything=true
Annotations:  messaging.knative.dev/creator: system:serviceaccount:knative-eventing:eventing-controller
              messaging.knative.dev/lastModifier: system:serviceaccount:knative-eventing:eventing-controller
              messaging.knative.dev/subscribable: v1beta1
API Version:  messaging.knative.dev/v1beta1
Kind:         InMemoryChannel
Metadata:
  Creation Timestamp:  2020-08-28T18:25:40Z
  Generation:          3
  Owner References:
    API Version:           eventing.knative.dev/v1alpha1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  Broker
    Name:                  default
    UID:                   01b7234d-b3f5-4fa2-9387-be563728d133
  Resource Version:        140892
  Self Link:               /apis/messaging.knative.dev/v1beta1/namespaces/default/inmemorychannels/default-kne-trigger
  UID:                     e9b4158a-8940-4682-8360-42c7b0d93380
Spec:
  Subscribers:
    Delivery:
      Dead Letter Sink:
    Generation:      2
    Reply Uri:       http://default-broker.default.svc.cluster.local
    Subscriber Uri:  http://default-broker-filter.default.svc.cluster.local/triggers/default/default-test-trigger/2e991ecc-f95c-471a-b03f-719cb103363b
    UID:             901f986d-dc9d-4445-913d-f27716b70ea4
Status:
  Address:
    URL:  http://default-kne-trigger-kn-channel.default.svc.cluster.local
  Conditions:
    Last Transition Time:  2020-08-28T18:25:40Z
    Status:                True
    Type:                  Ready
  Observed Generation:     3
  Subscribers:
    Observed Generation:  2
    Ready:                True
    UID:                  901f986d-dc9d-4445-913d-f27716b70ea4
Events:
  Type    Reason                     Age                 From                        Message
  ----    ------                     ----                ----                        -------
  Normal  InMemoryChannelReconciled  2s (x6 over 3m12s)  inmemorychannel-controller  InMemoryChannel reconciled: "default/default-kne-trigger"

As you can see, the subscription is still there, with Generation: 2.

I created the broker like this:

apiVersion: eventing.knative.dev/v1beta1
kind: Broker
metadata:
  name: default
  namespace: default

And the trigger like this:

apiVersion: eventing.knative.dev/v1beta1
kind: Trigger
metadata:
  name: default-test-trigger
  namespace: default
spec:
  broker: default
  subscriber:
    ref:
     apiVersion: serving.knative.dev/v1
     kind: Service
     name: re-detector-sessions

Could this somehow be a problem that's specific to my cluster? I'm not sure how... I'll take another look through the release notes to see if this was somehow fixed since 0.14..

slinkydeveloper commented 4 years ago

Hey people, I'm going to give look at this soon

/assign

slinkydeveloper commented 4 years ago

@MPeyrard86 which version of eventing are you using? eventing 0.14 is quite old and some things changed since then :smile:

mpeyrard commented 4 years ago

@slinkydeveloper We are still using 0.14. But we're probably due for another upgrade.

grantr commented 4 years ago

Possibly related to https://github.com/knative/eventing-contrib/issues/1560

vaikas commented 4 years ago

Ok, so tried to repro this from the head with those steps and I couldn't:

vaikas-a01:eventing-camel vaikas$ cat ~/repro-1520/broker.yaml
apiVersion: eventing.knative.dev/v1beta1
kind: Broker
metadata:
  name: repro
  namespace: default
vaikas-a01:eventing-camel vaikas$ cat ~/repro-1520/trigger.yaml
apiVersion: eventing.knative.dev/v1beta1
kind: Trigger
metadata:
  name: default-test-trigger
  namespace: default
spec:
  broker: repro
  subscriber:
    ref:
     apiVersion: serving.knative.dev/v1
     kind: Service
     name: event-display
vaikas-a01:eventing-camel vaikas$ kubectl create -f ~/repro-1520/broker.yaml
vaikas-a01:eventing-camel vaikas$ kubectl create -f ~/repro-1520/trigger.yaml
trigger.eventing.knative.dev/default-test-trigger created
vaikas-a01:eventing-camel vaikas$ kubectl get brokers
NAME      URL                                                                      AGE     READY   REASON
default   http://default-broker.default.svc.cluster.local                          6d18h   True
repro     http://broker-ingress.knative-eventing.svc.cluster.local/default/repro   8s      True
vaikas-a01:eventing-camel vaikas$ kubectl get triggers
NAME                   BROKER    SUBSCRIBER_URI                                   AGE     READY   REASON
default-test-trigger   repro     http://event-display.default.svc.cluster.local   10s     True
ping-trigger           default   http://subscriber.default.svc.cluster.local      6d18h   True
vaikas-a01:eventing-camel vaikas$ kubectl get imc
NAME                URL                                                             AGE   READY   REASON
my-channel          http://my-channel-kn-channel.default.svc.cluster.local          76d   True
my-channel-2        http://my-channel-2-kn-channel.default.svc.cluster.local        76d   True
repro-kne-trigger   http://repro-kne-trigger-kn-channel.default.svc.cluster.local   33s   True
vaikas-a01:eventing-camel vaikas$ kubectl get imc repro-kne-trigger -oyaml
apiVersion: messaging.knative.dev/v1
kind: InMemoryChannel
metadata:
  annotations:
    eventing.knative.dev/scope: cluster
    messaging.knative.dev/creator: system:serviceaccount:knative-eventing:eventing-controller
    messaging.knative.dev/lastModifier: system:serviceaccount:knative-eventing:eventing-controller
    messaging.knative.dev/subscribable: v1
  creationTimestamp: "2020-09-16T17:46:04Z"
  generation: 2
  labels:
    eventing.knative.dev/broker: repro
    eventing.knative.dev/brokerEverything: "true"
  name: repro-kne-trigger
  namespace: default
  ownerReferences:
  - apiVersion: eventing.knative.dev/v1
    blockOwnerDeletion: true
    controller: true
    kind: Broker
    name: repro
    uid: 80d72fbf-fe7d-43fb-b9c7-01a58ad1c3cf
  resourceVersion: "97973844"
  selfLink: /apis/messaging.knative.dev/v1/namespaces/default/inmemorychannels/repro-kne-trigger
  uid: 5599bb58-8518-4d75-8411-fe3c55be1cb2
spec:
  subscribers:
  - generation: 1
    replyUri: http://broker-ingress.knative-eventing.svc.cluster.local/default/repro
    subscriberUri: http://broker-filter.knative-eventing.svc.cluster.local/triggers/default/default-test-trigger/d29cb073-f5fc-4590-9114-9c05530f548b
    uid: 3b7b418a-c583-4ad1-99c1-a1a72ffd0f8b
status:
  address:
    url: http://repro-kne-trigger-kn-channel.default.svc.cluster.local
  conditions:
  - lastTransitionTime: "2020-09-16T17:46:04Z"
    status: "True"
    type: Addressable
  - lastTransitionTime: "2020-09-16T17:46:04Z"
    status: "True"
    type: ChannelServiceReady
  - lastTransitionTime: "2020-09-16T17:46:04Z"
    status: "True"
    type: DispatcherReady
  - lastTransitionTime: "2020-09-16T17:46:04Z"
    status: "True"
    type: EndpointsReady
  - lastTransitionTime: "2020-09-16T17:46:04Z"
    status: "True"
    type: Ready
  - lastTransitionTime: "2020-09-16T17:46:04Z"
    status: "True"
    type: ServiceReady
  observedGeneration: 2
  subscribers:
  - observedGeneration: 1
    ready: "True"
    uid: 3b7b418a-c583-4ad1-99c1-a1a72ffd0f8b
vaikas-a01:eventing-camel vaikas$ kubectl delete triggers default-test-trigger
trigger.eventing.knative.dev "default-test-trigger" deleted
vaikas-a01:eventing-camel vaikas$ kubectl get imc repro-kne-trigger -oyaml
apiVersion: messaging.knative.dev/v1
kind: InMemoryChannel
metadata:
  annotations:
    eventing.knative.dev/scope: cluster
    messaging.knative.dev/creator: system:serviceaccount:knative-eventing:eventing-controller
    messaging.knative.dev/lastModifier: system:serviceaccount:knative-eventing:eventing-controller
    messaging.knative.dev/subscribable: v1
  creationTimestamp: "2020-09-16T17:46:04Z"
  generation: 3
  labels:
    eventing.knative.dev/broker: repro
    eventing.knative.dev/brokerEverything: "true"
  name: repro-kne-trigger
  namespace: default
  ownerReferences:
  - apiVersion: eventing.knative.dev/v1
    blockOwnerDeletion: true
    controller: true
    kind: Broker
    name: repro
    uid: 80d72fbf-fe7d-43fb-b9c7-01a58ad1c3cf
  resourceVersion: "97976297"
  selfLink: /apis/messaging.knative.dev/v1/namespaces/default/inmemorychannels/repro-kne-trigger
  uid: 5599bb58-8518-4d75-8411-fe3c55be1cb2
spec: {}
status:
  address:
    url: http://repro-kne-trigger-kn-channel.default.svc.cluster.local
  conditions:
  - lastTransitionTime: "2020-09-16T17:46:04Z"
    status: "True"
    type: Addressable
  - lastTransitionTime: "2020-09-16T17:46:04Z"
    status: "True"
    type: ChannelServiceReady
  - lastTransitionTime: "2020-09-16T17:46:04Z"
    status: "True"
    type: DispatcherReady
  - lastTransitionTime: "2020-09-16T17:46:04Z"
    status: "True"
    type: EndpointsReady
  - lastTransitionTime: "2020-09-16T17:46:04Z"
    status: "True"
    type: Ready
  - lastTransitionTime: "2020-09-16T17:46:04Z"
    status: "True"
    type: ServiceReady
  observedGeneration: 3

So, I wonder if something was fixed along the way. When are you planning on ugprading? Also, FYI, there was an issue with some of the releases when going from .16 to .17, but it should have been fixed in the later dot releases. @Harwayne which is the one that has been fixed?

mpeyrard commented 4 years ago

I have put in a request to get our Kubernetes version upgraded, as that is blocking us from upgrading Knative. As soon as that happens, I will be upgrading Knative and trying this again.

vaikas commented 4 years ago

ok, thanks and sorry about the trouble :(

mpeyrard commented 4 years ago

No worries. This isn't affecting production or anything. We're using Knative to develop a new feature that hasn't shipped yet.

mpeyrard commented 4 years ago

Quick update: We're finally getting our Kubernetes upgraded to version 1.18. Should happen in the next week or two. I'll be upgrading Knative to 0.18 as soon as that's done and update this bug with the results.

pandagodyyy commented 3 years ago

I also met exactly same issue with @mpeyrard . @mpeyrard How about now, did you fix it by upgrading? Actually I met the same problem with IMC channel, knative-eventing 0.18 and 0.19 both. It seems upgrading cannot help this issue. @slinkydeveloper @vaikas, I feel it is easy to reproduce it. Because I am just knative beginner and use sample of quick start. Do we have clue on this issue and how can we walkround it?

vaikas commented 3 years ago

@pandagodyyy when you say it's easy to reproduce, if you can reliably reproduce it that would be great? I think both @slinkydeveloper and I tried to repro this and couldn't. I'll try to see if I can do it from .19 today and report back. But in the meantime if you have an easy repro (esp. with IMC) that would be great to share :)

pandagodyyy commented 3 years ago

@vaikas, I left message and went to bed yesterday, and when I wakes up today, I found the triggers works as expected. It seem it took several hours for recovering. So this morning I tried reproduce it in a brand new cluster again, and it happened once more. I recorded it below Environment: K8s v1.18 ;pre-installed Istio 1.7.3; Knative eventin 1.9

  1. kubectl.exe create namespace knative-eventing

  2. kubectl.exe label namespace knative-eventing istio-injection=enabled

  3. cat <<EOF | kubectl apply -f - apiVersion: "security.istio.io/v1beta1" kind: "PeerAuthentication" metadata: name: "default" namespace: "knative-eventing" spec: mtls: mode: PERMISSIVE EOF

  4. kubectl apply --filename https://github.com/knative/eventing/releases/download/v0.19.0/eventing-crds.yaml

  5. kubectl apply --filename https://github.com/knative/eventing/releases/download/v0.19.0/eventing-core.yaml

  6. kubectl apply --filename https://github.com/knative/eventing/releases/download/v0.19.0/in-memory-channel.yaml

  7. kubectl apply --filename https://github.com/knative/eventing/releases/download/v0.19.0/mt-channel-broker.yaml

  8. kubectl.exe create namespace premaster

  9. kubectl.exe label namespace premaster istio-injection=enabled

  10. `kubectl apply -f - << EOF apiVersion: eventing.knative.dev/v1 kind: Broker metadata: name: default namespace: premaster

    apiVersion: eventing.knative.dev/v1 kind: Trigger metadata: name: test-display namespace: premaster spec: broker: default filter: attributes: type: v1 subscriber: ref: apiVersion: v1 kind: Service name: hello-display

    apiVersion: apps/v1 kind: Deployment metadata: name: hello-display namespace: premaster spec: replicas: 1 selector: matchLabels: &labels app: hello-display template: metadata: labels: *labels spec: containers:

    • name: event-display image: gcr.io/knative-releases/knative.dev/eventing-contrib/cmd/event_display

      kind: Service apiVersion: v1 metadata: name: hello-display namespace: premaster spec: selector: app: hello-display ports:

      • protocol: TCP port: 80 targetPort: 8080 EOF`
  11. kubectl apply -f https://raw.githubusercontent.com/istio/istio/master/samples/sleep/sleep.yaml -n premaster

After all, check knative-eventing

image

check premster image

Then start sending event to broker, meanwhile monitoring logs of hello-display .\kubectl.exe exec -it sleep-64d7d56698-gg8dd -n premaster -- /bin/sh

curl -v "http://broker-ingress.knative-eventing.svc.cluster.local/premaster/default" \ -X POST \ -H "Ce-Id: say-hello" \ -H "Ce-Specversion: 1.0" \ -H "Ce-Type: v1" \ -H "Ce-Source: not-sendoff" \ -H "Content-Type: application/json" \ -d '{"msg":"Hello World!"}'

Result: message was shown in hello-display

Then I modified trigger with new Type v2 apiVersion: eventing.knative.dev/v1 kind: Trigger metadata: name: test-display namespace: premaster spec: broker: default filter: attributes: type: v2 subscriber: ref: apiVersion: v1 kind: Service name: hello-display Sent previous message curl -v "http://broker-ingress.knative-eventing.svc.cluster.local/premaster/default" \ -X POST \ -H "Ce-Id: say-hello" \ -H "Ce-Specversion: 1.0" \ -H "Ce-Type: v1" \ -H "Ce-Source: not-sendoff" \ -H "Content-Type: application/json" \ -d '{"msg":"Hello World!"}'

Result: still was shown in hello-display

Send new message with v2 curl -v "http://broker-ingress.knative-eventing.svc.cluster.local/premaster/default" \ -X POST \ -H "Ce-Id: say-hello" \ -H "Ce-Specversion: 1.0" \ -H "Ce-Type: v2" \ -H "Ce-Source: not-sendoff" \ -H "Content-Type: application/json" \ -d '{"msg":"Hello World!"}' Result: nothing was shown (If after several hours, the modified trigger can work)

If I delete current trigger, and then create new one After I sent a message, nothing was shown in hello-display However, check the log by kubectl.exe logs -f mt-broker-filter-79c59cc4dc-spxws -c filter -n knative-eventing it complained such error {"level":"info","ts":"2020-12-03T04:28:33.828Z","logger":"mt_broker_filter","caller":"filter/filter_handler.go:170","msg":"Unable to get the Trigger","commit":"0f9a8c5","error":"trigger had a different UID. From ref '13a5a1d0-3201-4283-b251-b4532625298d'. From Kubernetes '6491443d-276e-4448-9227-6fb9c5374984'","triggerRef":"premaster/test-display"} error: unexpected EOF After around one hour. it came back to normal

pandagodyyy commented 3 years ago

I did not enable local cluster gateway (I do not know if it is necessary, event seems can work without local cluster gateway).

mpeyrard commented 3 years ago

I have not yet tried with a higher version, because I'm still waiting for my DevOps team to upgrade our version of Kubernetes. I've since noticed that it does not happen 100% of the time. Work-around seems to be to re-install the broker.

github-actions[bot] commented 3 years ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.