knative / eventing

Event-driven application platform for Kubernetes
https://knative.dev/docs/eventing
Apache License 2.0
1.41k stars 590 forks source link

Sink replies sent back to sink despite non-matching Trigger filter #3149

Closed antoineco closed 4 years ago

antoineco commented 4 years ago

Describe the bug

When a sink behind a filtered Trigger receives events of a type T1 and replies with events of type T2, it receives its own responses although the T2 doesn't match the filter.

Expected behavior

The filter matches on T1 exclusively, so the sink should not receive any T2.

To Reproduce

  1. Create a broker:
apiVersion: eventing.knative.dev/v1beta1
kind: Broker
metadata:
  name: hello
  annotations:
    eventing.knative.dev/broker.class: MTChannelBasedBroker
  1. Create a simple hello responder (the image was built from docs/.../cloudevents-go):
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: hello
  labels:
    serving.knative.dev/visibility: cluster-local
spec:
  template:
    spec:
      containers:
      - image: antoineco/cloudevents-go
  1. Create a trigger matching the "ping" event type for the hello responder:
apiVersion: eventing.knative.dev/v1beta1
kind: Trigger
metadata:
  name: hello
spec:
  broker: hello
  filter:
    attributes:
      type: dev.knative.sources.ping
  subscriber:
    ref:
      apiVersion: serving.knative.dev/v1
      kind: Service
      name: hello
  1. Generate events to the broker using a PingSource:
apiVersion: sources.knative.dev/v1alpha2
kind: PingSource
metadata:
  name: hello
spec:
  schedule: '* * * * *'
  jsonData: '{"name":"Woodford"}'
  sink:
    ref:
      apiVersion: eventing.knative.dev/v1beta1
      kind: Broker
      name: hello
  1. The sink receives events of type dev.knative.sources.ping and responds with events of type dev.knative.docs.sample.

  2. Check the sink logs. For every event generated by the PingSource, you will see 3 received events instead of 1:

    2020/05/15 00:15:00 Got an event from: "Woodford"
    2020/05/15 00:15:00 Replying with event: "Hello, Woodford"
    2020/05/15 00:15:00 Got an event from: "Woodford"
    2020/05/15 00:15:00 Replying with event: "Hello, Woodford"
    2020/05/15 00:15:00 Got an event from: ""
    2020/05/15 00:15:00 Replying with event: "Hello, "
    -------- 1 min break ----------
    2020/05/15 00:16:00 Got an event from: "Woodford"
    2020/05/15 00:16:00 Replying with event: "Hello, Woodford"
    2020/05/15 00:16:00 Got an event from: "Woodford"
    2020/05/15 00:16:00 Replying with event: "Hello, Woodford"
    2020/05/15 00:16:00 Got an event from: ""
    2020/05/15 00:16:00 Replying with event: "Hello, "

    Knative release version

master @ 73d48402a0aac86bac82ebf1f7ed55ffffcce3f9

Additional context

Swapping the Trigger with a Subscription and changing the reply attribute of that Subscription to a different broker yields only 1 event every minute in the sink:

2020/05/15 00:44:00 Got an event from: "Woodford"
2020/05/15 00:44:00 Replying with event: "Hello, Woodford"
-------- 1 min break ----------
2020/05/15 00:45:00 Got an event from: "Woodford"
2020/05/15 00:45:00 Replying with event: "Hello, Woodford"
antoineco commented 4 years ago

Before somebody asks: there is no other Trigger or even Subscription inside that cluster.

vaikas commented 4 years ago

zomg, that seems totes broken. I'll take a look tmw.

vaikas commented 4 years ago

Just to clarify, there's no 'exactly once' delivery guarantees, but repeatedly getting multiple events delivered seems very broken...

grantr commented 4 years ago

It would be a really bad bug if replies from triggers were skipping the filters somehow. I'll take a look at this today. Thanks for the clear repro @antoineco.

grantr commented 4 years ago

I'm unable to repro this.

Steps

  1. Create a new cluster using GKE regular channel (K8s 1.15)
  2. Apply eventing from master
  3. Apply IMC channel
  4. Apply MT broker
  5. Create a consumer. I cheated a bit by creating a v1 Service and Deployment rather than using a ksvc, since I didn't know which version of serving/istio you were using and I wanted to eliminate that a variable.
  6. Create MT broker hello
  7. Create trigger, modified to point to the v1 Service
  8. Create ping source

Note that neither serving nor istio are installed in the cluster.

Yamls

Service and Deployment I used instead of the ksvc:

apiVersion: v1
kind: Service
metadata:
  name: hello-consumer
spec:
  selector:
    role: hello-consumer
  ports:
    - name: http
      port: 80
      targetPort: cloudevents
      protocol: TCP

---

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hello-consumer
spec:
  selector:
    matchLabels:
      role: hello-consumer
  template:
    metadata:
      labels:
        role: hello-consumer
    spec:
      containers:
      - name: consumer
        image: antoineco/cloudevents-go
        ports:
          - name: cloudevents
            containerPort: 8080

The modified trigger:

apiVersion: eventing.knative.dev/v1beta1
kind: Trigger
metadata:
  name: hello
spec:
  broker: hello
  filter:
    attributes:
      type: dev.knative.sources.ping
  subscriber:
    ref:
      apiVersion: v1
      kind: Service
      name: hello-consumer

Logs

Here's the output I get from the consumer pod:

hello-consumer-7cd7c4f85b-c4z8c consumer 2020/05/15 21:00:00 Got an event from: "Woodford"
hello-consumer-7cd7c4f85b-c4z8c consumer 2020/05/15 21:00:00 Replying with event: "Hello, Woodford"
hello-consumer-7cd7c4f85b-c4z8c consumer 2020/05/15 21:01:00 Got an event from: "Woodford"
hello-consumer-7cd7c4f85b-c4z8c consumer 2020/05/15 21:01:00 Replying with event: "Hello, Woodford"
hello-consumer-7cd7c4f85b-c4z8c consumer 2020/05/15 21:02:00 Got an event from: "Woodford"
hello-consumer-7cd7c4f85b-c4z8c consumer 2020/05/15 21:02:00 Replying with event: "Hello, Woodford"

And here's the debug output from the broker pods. The ingress receives the event and assigns a TTL. The filter receives the initial event, then the reply a few microseconds later, then filters out the reply. This is exactly what I'd expect.

broker-ingress-d448d4c7f-4r6zm ingress {"level":"debug","ts":"2020-05-15T21:06:00.001Z","logger":"mt_broker_ingress","caller":"mtbroker/ttl.go:66","msg":"TTL not found in outbound event, defaulting.","commit":"23a2233","event.id":"6b4f0610-325f-42dd-b571-8427d71c1c49","knativebrokerttl":255,"error":"\"knativebrokerttl\" not found"}
broker-filter-6b56bf6b48-ncrsm filter {"level":"debug","ts":"2020-05-15T21:06:00.002Z","logger":"mt_broker_filter","caller":"filter/filter_handler.go:216","msg":"Received message","commit":"23a2233","triggerRef":"default/hello"}
broker-filter-6b56bf6b48-ncrsm filter {"level":"debug","ts":"2020-05-15T21:06:00.011Z","logger":"mt_broker_filter","caller":"filter/filter_handler.go:216","msg":"Received message","commit":"23a2233","triggerRef":"default/hello"}
broker-filter-6b56bf6b48-ncrsm filter {"level":"debug","ts":"2020-05-15T21:06:00.011Z","logger":"mt_broker_filter","caller":"filter/filter_handler.go:274","msg":"Event did not pass filter","commit":"23a2233","triggerRef":"default/hello"}

Further investigation

@antoineco can you enable debug logging in your broker? Edit the knative-eventing/config-logging configmap and add these keys:

data:
  loglevel.mt_broker_filter: debug
  loglevel.mt_broker_ingress: debug

I'd love to see if you're getting the same logs as me.

antoineco commented 4 years ago

I won't have much time over the weekend but I'll try re-running the experiment with/without istio before Monday evening.

antoineco commented 4 years ago

Oh, before I forget, that cluster was using KafkaChannel as the default.

markfisher commented 4 years ago

@antoineco I just ran through your example using the in-memory channel and witnessed the expected behavior (same as @grantr), but when I get a chance, I will also setup the Kafka Channel to see if that could be a factor.

markfisher commented 4 years ago

@antoineco I am seeing one response per-minute using the KafkaChannel also:

2020/05/16 17:30:00 Got an event from: "Woodford"
2020/05/16 17:30:00 Replying with event: "Hello, Woodford"
2020/05/16 17:31:00 Got an event from: "Woodford"
2020/05/16 17:31:00 Replying with event: "Hello, Woodford"
2020/05/16 17:32:00 Got an event from: "Woodford"
2020/05/16 17:32:00 Replying with event: "Hello, Woodford"

but, I am on the current master as of now (b3730b5)

antoineco commented 4 years ago

I also can't reproduce in a fresh cluster, it's driving me mad 😄 I'll try to replay what I did before this happened, and if I can't repro myself I'll just close this issue.

Thanks @grantr and @markfisher for trying on your side.

antoineco commented 4 years ago

I tried with Deployments, with Knative Services (Serving + Istio minimal from master), with InMemoryChannel, with KafkaChannel. I can't reproduce this, for the better.

The only way I can reproduce the symptoms is by creating this Subscription in the same namespace:

apiVersion: messaging.knative.dev/v1beta1
kind: Subscription
metadata:
  name: hello
spec:
  channel:
    apiVersion: messaging.knative.dev/v1alpha1
    kind: KafkaChannel
    name: hello-kne-trigger
  subscriber:
    ref:
      apiVersion: serving.knative.dev/v1
      kind: Service
      name: hello
  reply:
    ref:
      apiVersion: serving.knative.dev/v1
      kind: Service
      name: event-display

The result I observed is of course expected with the presence of that extra Subscription, since it sends every response back to the hello Service.

I actually had an almost identical Subscription in the namespace before swapping it with a Trigger. My unconvincing but only conclusion is that something may not have been cleaned up properly, and that the dispatcher was acting as if both the Trigger and Subscription were present. That's the only way I can explain it.

I'm closing this and will reopen with debug logs if I run into the situation again. Thanks again for checking.