tektoncd / triggers

Event triggering with Tekton!
Apache License 2.0
552 stars 416 forks source link

Event listener restarts when under minimal stress #1618

Open ggallen opened 1 year ago

ggallen commented 1 year ago

Expected Behavior

When under minimal stress, the event listener restarts. I was able to get the EL to restart by sending < 3K message to it using curl in 20 seconds.

Actual Behavior

An event listener should be able to handle thousands of messages per second.

Steps to Reproduce the Problem

  1. Configure and start an event listener
  2. Run the attached script with the attached JSON file ./send-cloudevent-forever.sh cloudevent5.json
  3. Watch for the event listener to restart send-cloudevent-forever.sh.txt cloudevent5.json.txt

Additional Info

ggallen commented 1 year ago

I believe I have found the problem here.

I started of trying to debug by sending a single message to my broker. Surprisingly, this let to 101 events being sent to my EventListener. After turning on trigger debugging, I reran my test and again check the logs. After analyzing the messages, it broke down like this:

      1 Ce-Type:[com.redhat.agogos.event.component.build.v1alpha1]
    254 Ce-Type:[dev.tekton.event.triggers.accepted.v1]

Further investigation reveals that this message is sent in sink.go (https://github.com/tektoncd/triggers/blob/main/pkg/sink/sink.go#L236-L255). If the message encoding is unknown, a simple HTTP response is sent with an "Accepted" status. However, if the encoding is known, the CloudEvent message "dev.tekton.event.triggers.accepted.v1" is sent. In my case it appears the encoding is known.

However, this appears to cause a furious loop. As the accepted event is sent, the broker accepts it and responds by forwarding it back to the EventListener. The EventListener receives the event, processes it, and since it has a known encoding it again responds by sending an accepted evetnt. There appears to be an infinite loop, and I'm not sure why it ever ends, but at some point it does. It could be because the EL gets so overwhelmed the eventually it can't respond in time and breaks the cycle.

To me this appears to be a flaw in the logic of sending an event in response to another event. There's probably a good reason to do so, but it's nothing that I am aware of. But my setup does not seem unusual so it seems like this could be a common problem for anyone who is sending events with a known encoding.

To test my theory, I commented out the sending of the accepted event and ran my curl test. The test ran for an hour, and it send 1.5 million events to the EL without a single restart. A check of the EventListener log reveals no timeout errors and no errors (other than the expected FailedPrecondition errors).

I'm not sure what the fix should be here. To me it would make sense for --el-events disable to also disable the sending of this event. Or maybe my configuration needs to change somehow, but it's unclear to me how to do this. It seems to be a flaw in the logic of responding to an event with another event, unless the EventListener can/should be configured to ignored responding to events that it actually sent. That may be a better solution, and maybe it should do that by default.

khrm commented 1 year ago

@afrittoli What do you think about this request? Is there a flaw in the logic of sending cloud-event responses to cloud-event requests?

ggallen commented 1 year ago

I actually think the EL should contain logic to not respond with another event to an event it generated. I think that's what's missing.

afrittoli commented 1 year ago

I actually think the EL should contain logic to not respond with another event to an event it generated. I think that's what's missing.

Thanks @ggallen for your analysis of this. To give some more context, Tekton Triggers can both consume and produce CloudEvents.

Producing CloudEvents

Tekton can produce CloudEvents to notify consumers about things that happen in Tekton, like a trigger was received, a pipeline was executed, etc. This functionality has existed in Tekton Pipelines for some time, and has recently been introduced into Tekton triggers as well.

Consuming CloudEvents

An event producer creates a CloudEvent, which can be sent directly to an event listener sink or sent to a CloudEvent broker and delivered to the event listener sink via a subscription.

In Tekton's own infra cluster, we use a Knative Eventing channel to route CloudEvents. The Knative subscription object sends the events to the event listener sink and expects a response. If the response is missing, or if the response is not a valid CloudEvent, it considers the message delivery to be failed and it attempts again. Tekton triggers added a CloudEvent response to CloudEvents triggers to address this issue.

Tekton must respond to an incoming CloudEvents, and that response must be in valid CloudEvents format according to the CloudEvents spec. Something is not working either in the cloudevent response or in the broker.


With that context, I would like to make sure I understand where the loop happens:

sequenceDiagram
    participant broker
    participant event-listener
    participant el-sink
    broker->>event-listener: sends a CloudEvent `com.redhat.agogos`
    event-listener ->>el-sink: produces a new CloudEvent `dev.tekton.event.triggers.started.v1`
    event-listener->>broker: sends a CloudEvent response `dev.tekton.event.triggers.accepted.v1`
    broker ->>event-listener: sends a CloudEvent response to  `dev.tekton.event.triggers.accepted.v1`
    event-listener->>broker: sends a CloudEvent response `dev.tekton.event.triggers.accepted.v1`
    broker ->>event-listener: sends a CloudEvent response to  `dev.tekton.event.triggers.accepted.v1`
    event-listener->>broker: sends a CloudEvent response `dev.tekton.event.triggers.accepted.v1`

If this is correct, it looks like the event broker does not recognise the response produced by the event listener as a response, instead, it interprets it as a new event to which it replies with a response to the event listener, which didn't expect a response and treats the incoming message as a new event and so on. I wonder how that is possible :S I guess that the trigger configured filters by event type, or else the trigger would fire many times as well!

Do you have a trace of the content of the event listener response to the initial event?

ggallen commented 1 year ago

@afrittoli, I think the loop happens between the broker and the event listener as you show in your diagram. The UIDs are changing, so I also think the broker must be interpreting the response as a new event.

But I don't know how the broker is supposed to determine that the new event is in fact a response and not a new event. Is there something missing in the response that is supposed to indicate that to the broker?

If you tell me how to get what you want for debugging information, I can get it for you. But this should be easy to reproduce.

ggallen commented 1 year ago

@afrittoli, if it makes any difference, we are using the Knative MT broker.

ggallen commented 1 year ago

@afrittoli ?

khrm commented 1 year ago

@afrittoli I think what @ggallen want is a receiving without a cloud event.

Sending a cloud event in response to a cloud event is a valid response as we can see from this receiver: https://github.com/cloudevents/sdk-go/blob/7f5ef3992769f96b40a54c4d59291be62acd36da/v2/client/client.go#L48

What @gallen want is just a Result response (which consists of a status code and message string) to CloudEvent. https://github.com/cloudevents/sdk-go/blob/7f5ef3992769f96b40a54c4d59291be62acd36da/v2/client/client.go#L43

Can't we just filter events by the source on the broker side?

ggallen commented 1 year ago

Well, I don't really care what I get for a response, I just don't want all the messages. But I think there is a problem here. The question is where, and I see a couple of possibilities:

  1. The EL is sending a bad response back to the broker, causing the broker to relay the response back as a new message.
  2. The broker is misinterpreting a valid EL response as a new event, and responding accordingly.

I think the problem, wherever it is, should be fixed. It's possible that it's #2, and the options are to either get a fix to the MT broker or change to a different broker that handles things correctly.

I think the question is where is the actual problem? I don't have the knowledge to answer that question.

khrm commented 1 year ago

If we can't filter events by the source on the broker side, then we can probably add an annotation to EventListener to determine what to do for CloudEvents request - default - send Events or just status-code and msg.

khrm commented 1 year ago

@afrittoli If you are free, please clarify on how to proceed with this.

afrittoli commented 1 year ago

Hello @khrm and @ggallen - sorry about the late reply. Unfortunately, I've had no time to look further into this - and I'm now on PTO until September 😓

To summarise the issues as I understand them today:

  1. The original way the event listeners responded to a CloudEvent from the Knative Broker was not recognised by the Knative Broker as a valid CloudEvent response, which triggered retries and thus a lot of duplicated messages. The workaround for that was to set retries to 0 in the subscription. The fix was to change triggers responses to CloudEvents
  2. The current way the event listener responds to a CloudEvent is recognised by the Knative Broker as a new CloudEvent, so the Knative broker sends a reply (type dev.tekton.event.triggers.accepted.v1)
  3. The reply by the Knative broker is interpreted by Tekton triggers as a new event, etc.

In both issues (2) and (3), the response sent is not identified as a response, but rather as a new event, which triggers a response.

In both those cases, the issue can be:

Note that in (2) and (3) producer and consumers are inverted, so we could have two separate issues:

The next steps would be to have examples of each of the responses produced by both systems, compare them with the specification and figure out whether the issue is the producer or consumer of the response.

I hope this helps move forward towards a solution.

ggallen commented 1 year ago

I have gotten the Broker and EL messages. I chose three message/response pair as I think that's enough to get a picture of what's going on.

I also looked at the CDEvents documentation at https://github.com/cdevents/spec, but it wasn't helpful to someone with my level of knowledge about this topic. But I am wondering if it makes any difference that the initial event being sent to the EL from the Broker is not one of the predefined CD events.

Is that possibly the problem?

Here are the messages. Hopefully, this information can shed some light on what might be the problem.

Broker: { "level": "debug", "ts": "2023-08-15T17:53:38.743Z", "logger": "eventlistener", "caller": "sink/sink.go:177", "msg": "handling event with path /, payload: { \"name\": \"dummy-component-dep-2\", \"resource\": \"Component\", \"instance\": \"eea42e7e-4543-4309-bc6b-11cf35549575\" } and header: map[Accept-Encoding:[gzip] Ce-Id:[e3b1494d-05f1-45aa-8792-fcc3608fbf24] Ce-Knativearrivaltime:[2023-08-15T17:53:38.719745294Z] Ce-Source:[http://localhost] Ce-Specversion:[1.0] Ce-Type:[com.redhat.agogos.event.component.build.v1alpha1] Content-Length:[123] Content-Type:[application/json] Prefer:[reply] Traceparent:[00-d74ea1c9a53a46643d3eaeaa95e3e442-9f3e62b83c92cacb-00] User-Agent:[Go-http-client/1.1]]", "commit": "fed390f-dirty", "eventlistener": "agogos", "namespace": "gallen", "/triggers-eventid": "62b6e2bb-a68f-4dc8-b78b-9fb91b6bcb8b", "eventlistenerUID": "e6bc7aeb-716d-4a3a-b094-068cad722b41", "/triggers-eventid": "62b6e2bb-a68f-4dc8-b78b-9fb91b6bcb8b" } Event Listener: { "level": "debug", "ts": "2023-08-15T17:53:38.744Z", "logger": "eventlistener", "caller": "sink/sink.go:252", "msg": "Responding with cloud event, payload: &{Context:Context Attributes,\n specversion: 1.0\n type: dev.tekton.event.triggers.accepted.v1\n source: agogos\n subject: gallen.agogos accepted 62b6e2bb-a68f-4dc8-b78b-9fb91b6bcb8b\n id: 62b6e2bb-a68f-4dc8-b78b-9fb91b6bcb8b\n datacontenttype: application/json\n DataEncoded:[123 34 101 118 101 110 116 76 105 115 116 101 110 101 114 34 58 34 97 103 111 103 111 115 34 44 34 110 97 109 101 115 112 97 99 101 34 58 34 103 97 108 108 101 110 34 44 34 101 118 101 110 116 76 105 115 116 101 110 101 114 85 73 68 34 58 34 101 54 98 99 55 97 101 98 45 55 49 54 100 45 52 97 51 97 45 98 48 57 52 45 48 54 56 99 97 100 55 50 50 98 52 49 34 44 34 101 118 101 110 116 73 68 34 58 34 54 50 98 54 101 50 98 98 45 97 54 56 102 45 52 100 99 56 45 98 55 56 98 45 57 102 98 57 49 98 54 98 99 98 56 98 34 125] DataBase64:false FieldErrors:map[]}", "commit": "fed390f-dirty", "eventlistener": "agogos", "namespace": "gallen", "/triggers-eventid": "62b6e2bb-a68f-4dc8-b78b-9fb91b6bcb8b", "eventlistenerUID": "e6bc7aeb-716d-4a3a-b094-068cad722b41", "/triggers-eventid": "62b6e2bb-a68f-4dc8-b78b-9fb91b6bcb8b" } Broker: { "level": "debug", "ts": "2023-08-15T17:53:38.751Z", "logger": "eventlistener", "caller": "sink/sink.go:177", "msg": "handling event with path /, payload: {\"eventListener\":\"agogos\",\"namespace\":\"gallen\",\"eventListenerUID\":\"e6bc7aeb-716d-4a3a-b094-068cad722b41\",\"eventID\":\"62b6e2bb-a68f-4dc8-b78b-9fb91b6bcb8b\"} and header: map[Accept-Encoding:[gzip] Ce-Id:[62b6e2bb-a68f-4dc8-b78b-9fb91b6bcb8b] Ce-Knativearrivaltime:[2023-08-15T17:53:38.749146357Z] Ce-Source:[agogos] Ce-Specversion:[1.0] Ce-Subject:[gallen.agogos accepted 62b6e2bb-a68f-4dc8-b78b-9fb91b6bcb8b] Ce-Type:[dev.tekton.event.triggers.accepted.v1] Content-Length:[154] Content-Type:[application/json] Prefer:[reply] Traceparent:[00-d74ea1c9a53a46643d3eaeaa95e3e442-88f37a21ff5b0be1-00] User-Agent:[Go-http-client/1.1]]", "commit": "fed390f-dirty", "eventlistener": "agogos", "namespace": "gallen", "/triggers-eventid": "ab7642c2-0e65-463b-be45-6ff498034170", "eventlistenerUID": "e6bc7aeb-716d-4a3a-b094-068cad722b41", "/triggers-eventid": "ab7642c2-0e65-463b-be45-6ff498034170" } Event Listener: { "level": "debug", "ts": "2023-08-15T17:53:38.751Z", "logger": "eventlistener", "caller": "sink/sink.go:252", "msg": "Responding with cloud event, payload: &{Context:Context Attributes,\n specversion: 1.0\n type: dev.tekton.event.triggers.accepted.v1\n source: agogos\n subject: gallen.agogos accepted ab7642c2-0e65-463b-be45-6ff498034170\n id: ab7642c2-0e65-463b-be45-6ff498034170\n datacontenttype: application/json\n DataEncoded:[123 34 101 118 101 110 116 76 105 115 116 101 110 101 114 34 58 34 97 103 111 103 111 115 34 44 34 110 97 109 101 115 112 97 99 101 34 58 34 103 97 108 108 101 110 34 44 34 101 118 101 110 116 76 105 115 116 101 110 101 114 85 73 68 34 58 34 101 54 98 99 55 97 101 98 45 55 49 54 100 45 52 97 51 97 45 98 48 57 52 45 48 54 56 99 97 100 55 50 50 98 52 49 34 44 34 101 118 101 110 116 73 68 34 58 34 97 98 55 54 52 50 99 50 45 48 101 54 53 45 52 54 51 98 45 98 101 52 53 45 54 102 102 52 57 56 48 51 52 49 55 48 34 125] DataBase64:false FieldErrors:map[]}", "commit": "fed390f-dirty", "eventlistener": "agogos", "namespace": "gallen", "/triggers-eventid": "ab7642c2-0e65-463b-be45-6ff498034170", "eventlistenerUID": "e6bc7aeb-716d-4a3a-b094-068cad722b41", "/triggers-eventid": "ab7642c2-0e65-463b-be45-6ff498034170" } Broker: { "level": "debug", "ts": "2023-08-15T17:53:38.753Z", "logger": "eventlistener", "caller": "sink/sink.go:177", "msg": "handling event with path /, payload: {\"eventListener\":\"agogos\",\"namespace\":\"gallen\",\"eventListenerUID\":\"e6bc7aeb-716d-4a3a-b094-068cad722b41\",\"eventID\":\"ab7642c2-0e65-463b-be45-6ff498034170\"} and header: map[Accept-Encoding:[gzip] Ce-Id:[ab7642c2-0e65-463b-be45-6ff498034170] Ce-Knativearrivaltime:[2023-08-15T17:53:38.752346245Z] Ce-Source:[agogos] Ce-Specversion:[1.0] Ce-Subject:[gallen.agogos accepted ab7642c2-0e65-463b-be45-6ff498034170] Ce-Type:[dev.tekton.event.triggers.accepted.v1] Content-Length:[154] Content-Type:[application/json] Prefer:[reply] Traceparent:[00-d74ea1c9a53a46643d3eaeaa95e3e442-71a8938ac1254cf6-00] User-Agent:[Go-http-client/1.1]]", "commit": "fed390f-dirty", "eventlistener": "agogos", "namespace": "gallen", "/triggers-eventid": "919bcc93-8586-4d09-89e8-b6919a2bbc21", "eventlistenerUID": "e6bc7aeb-716d-4a3a-b094-068cad722b41", "/triggers-eventid": "919bcc93-8586-4d09-89e8-b6919a2bbc21" } Event Listener: { "level": "debug", "ts": "2023-08-15T17:53:38.753Z", "logger": "eventlistener", "caller": "sink/sink.go:252", "msg": "Responding with cloud event, payload: &{Context:Context Attributes,\n specversion: 1.0\n type: dev.tekton.event.triggers.accepted.v1\n source: agogos\n subject: gallen.agogos accepted 919bcc93-8586-4d09-89e8-b6919a2bbc21\n id: 919bcc93-8586-4d09-89e8-b6919a2bbc21\n datacontenttype: application/json\n DataEncoded:[123 34 101 118 101 110 116 76 105 115 116 101 110 101 114 34 58 34 97 103 111 103 111 115 34 44 34 110 97 109 101 115 112 97 99 101 34 58 34 103 97 108 108 101 110 34 44 34 101 118 101 110 116 76 105 115 116 101 110 101 114 85 73 68 34 58 34 101 54 98 99 55 97 101 98 45 55 49 54 100 45 52 97 51 97 45 98 48 57 52 45 48 54 56 99 97 100 55 50 50 98 52 49 34 44 34 101 118 101 110 116 73 68 34 58 34 57 49 57 98 99 99 57 51 45 56 53 56 54 45 52 100 48 57 45 56 57 101 56 45 98 54 57 49 57 97 50 98 98 99 50 49 34 125] DataBase64:false FieldErrors:map[]}", "commit": "fed390f-dirty", "eventlistener": "agogos", "namespace": "gallen", "/triggers-eventid": "919bcc93-8586-4d09-89e8-b6919a2bbc21", "eventlistenerUID": "e6bc7aeb-716d-4a3a-b094-068cad722b41", "/triggers-eventid": "919bcc93-8586-4d09-89e8-b6919a2bbc21" }

khrm commented 1 year ago

That's fine. EL can receive any CE-Events. It will respond with Accepted Event which shouldn't come back.

khrm commented 1 year ago

@ggallen Can you test the broker with this sample? https://github.com/cloudevents/sdk-go/blob/7f5ef3992769f96b40a54c4d59291be62acd36da/samples/http/responder/main.go

If your broker while receiving the CE event sends back the event again, then it's broken. Otherwise, EL has a problem.

ggallen commented 1 year ago

I don't understand what you want me to do. I can't run that code. Do you want me to send that message?

You can see from my example that the Broker ends up sending back the accepted message to the EL.

khrm commented 1 year ago

@ggallen , can you try setting filters like given here: https://github.com/knative/eventing/blob/main/docs/broker/filtering.md ?

I wanted you to run something similar and send a message via a broker. Let me create a program.

khrm commented 1 year ago

@ggallen Please use this deployment file: https://raw.githubusercontent.com/khrm/ce-test/master/deployment.yaml

First, send a direct request after port-forwarding the service:

kubectl port-forward svc/ce-test 8080

Afterward, you can use the below commands to send a request:

cd cmd/client
go run main.go

Afterward, you can port-forward the MT broker endpoint (expose as port 8080) and send the request using the above command.

ggallen commented 1 year ago

@khrm, something is not quite right, but here are the results. Sending to ce-test:

warning: GOPATH set to GOROOT (/usr/local/go) has no effect
Response:
Context Attributes,
  specversion: 1.0
  type: samples.http.test
  source: /test
  subject: https://github.com/cloudevents/sdk-go/v2/samples/requester#0
  id: 5f2b0c22-10ad-4f1e-98ec-0c69a482a913
  time: 2023-08-18T11:49:02.700185606Z
  datacontenttype: application/json
Data,
  {
    "id": 0,
    "message": "test done!"
  }

Got Event Response Context: Context Attributes,
  specversion: 1.0
  type: samples.http.test
  source: /test
  subject: https://github.com/cloudevents/sdk-go/v2/samples/requester#0
  id: 5f2b0c22-10ad-4f1e-98ec-0c69a482a913
  time: 2023-08-18T11:49:02.700185606Z
  datacontenttype: application/json

Got Response Data: &{Sequence:0 Message:test done!}
----------------------------

Sending to the MT broker:

warning: GOPATH set to GOROOT (/usr/local/go) has no effect
2023/08/18 08:05:25 Event sent at 2023-08-18 08:05:25.166677156 -0400 EDT m=+0.005350907
2023/08/18 08:05:25 Response status code 404: 
khrm commented 1 year ago

@ggallen Was message received at CE-Test service port? I don't think so.

ggallen commented 1 year ago

Well, I can't get anything to run now. I see this in the events:

22s         Normal    Created             pod/ce-test-54c5f5f7-2n6k4        Created container ce-test
21s         Normal    Started             pod/ce-test-54c5f5f7-2n6k4        Started container ce-test
23s         Warning   Unhealthy           pod/ce-test-54c5f5f7-2n6k4        Liveness probe failed: HTTP probe failed with statuscode: 405
23s         Normal    Killing             pod/ce-test-54c5f5f7-2n6k4        Container ce-test failed liveness probe, will be restarted
ggallen commented 1 year ago

@khrm, the ce-test pod keeps restarting.

[gallen@localhost ce-test] (master)$ kc get pods
NAME                         READY   STATUS              RESTARTS   AGE
ce-test-54c5f5f7-rl757       0/1     ContainerCreating   0          13s
el-agogos-65596c479b-lmhhc   1/1     Running             0          3m51s
[gallen@localhost ce-test] (master)$ kc get pods
NAME                         READY   STATUS    RESTARTS   AGE
ce-test-54c5f5f7-rl757       0/1     Running   0          19s
el-agogos-65596c479b-lmhhc   1/1     Running   0          3m57s
[gallen@localhost ce-test] (master)$ kc logs ce-test-54c5f5f7-rl757
[gallen@localhost ce-test] (master)$ kc logs ce-test-54c5f5f7-rl757
[gallen@localhost ce-test] (master)$ kc get pods
NAME                         READY   STATUS             RESTARTS       AGE
ce-test-54c5f5f7-rl757       0/1     CrashLoopBackOff   11 (27s ago)   28m
el-agogos-65596c479b-lmhhc   1/1     Running            0              32m
[gallen@localhost ce-test] (master)$ 

In the output above, I haven't done anything but let time pass.

I am seeing the liveness probe fail as in the previous comment.

ggallen commented 1 year ago

@khrm, I was able to finally send a message directly to the broker:

warning: GOPATH set to GOROOT (/usr/local/go) has no effect
2023/08/18 10:54:51 Event sent at 2023-08-18 10:54:51.843636122 -0400 EDT m=+0.005916899
2023/08/18 10:54:51 Response status code 202: 

And what the EL shows in the log:

{"level":"debug","ts":"2023-08-18T14:57:25.966Z","caller":"logging/config.go:116","msg":"Successfully created the logger."}
{"level":"debug","ts":"2023-08-18T14:57:25.967Z","caller":"logging/config.go:117","msg":"Logging level set to: debug"}
{"level":"debug","ts":"2023-08-18T14:57:25.968Z","logger":"eventlistener","caller":"metrics/metrics_worker.go:76","msg":"Flushing the existing exporter before setting up the new exporter.","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T14:57:25.969Z","logger":"eventlistener","caller":"metrics/prometheus_exporter.go:51","msg":"Created Prometheus exporter with config: &{tekton.dev/triggers eventlistener prometheus 5000000000 <nil>  false 9000 0.0.0.0}. Start the server for Prometheus exporter.","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T14:57:25.969Z","logger":"eventlistener","caller":"metrics/metrics_worker.go:91","msg":"Successfully updated the metrics exporter; old config: <nil>; new config &{tekton.dev/triggers eventlistener prometheus 5000000000 <nil>  false 9000 0.0.0.0}","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T14:57:25.971Z","logger":"eventlistener","caller":"adapter/adapter.go:88","msg":"Creating event broadcaster","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T15:01:18.944Z","logger":"eventlistener","caller":"sink/sink.go:177","msg":"handling event with path /, payload: {\"id\":0,\"message\":\"Hello world!\"} and header: map[Accept-Encoding:[gzip] Ce-Id:[8a01378a-4f68-4baf-8d10-93240543774d] Ce-Knativearrivaltime:[2023-08-18T15:01:18.939591943Z] Ce-Source:[https://github.com/cloudevents/sdk-go/v2/samples/requester] Ce-Specversion:[1.0] Ce-Time:[2023-08-18T15:01:18.933199105Z] Ce-Type:[com.cloudevents.sample.sent] Content-Length:[33] Content-Type:[application/json] Prefer:[reply] Traceparent:[00-e8655703484c759904c8dd719d58244c-f9e172fdf6b5ca8c-00] User-Agent:[Go-http-client/1.1]]","commit":"2ec8bc6-dirty","eventlistener":"agogos","namespace":"gallen","/triggers-eventid":"09c62c6b-fb93-4865-a1c0-4644100b7689","eventlistenerUID":"bd026c66-0e12-48ee-8304-e8277a1c65fe","/triggers-eventid":"09c62c6b-fb93-4865-a1c0-4644100b7689"}
{"level":"debug","ts":"2023-08-18T15:01:18.944Z","logger":"eventlistener","caller":"sink/metrics.go:115","msg":"event listener request","commit":"2ec8bc6-dirty","status":"succeeded"}
{"level":"debug","ts":"2023-08-18T15:01:18.944Z","logger":"eventlistener","caller":"sink/metrics.go:100","msg":"event listener request completed","commit":"2ec8bc6-dirty","status":202,"duration":0.000231657}

Sending to ce-test:

warning: GOPATH set to GOROOT (/usr/local/go) has no effect
Response:
Context Attributes,
  specversion: 1.0
  type: samples.http.test
  source: /test
  subject: https://github.com/cloudevents/sdk-go/v2/samples/requester#0
  id: a5447e02-4e7e-42d7-87c7-a459c1fe34cd
  time: 2023-08-18T15:03:35.92904009Z
  datacontenttype: application/json
Data,
  {
    "id": 0,
    "message": "test done!"
  }

Got Event Response Context: Context Attributes,
  specversion: 1.0
  type: samples.http.test
  source: /test
  subject: https://github.com/cloudevents/sdk-go/v2/samples/requester#0
  id: a5447e02-4e7e-42d7-87c7-a459c1fe34cd
  time: 2023-08-18T15:03:35.92904009Z
  datacontenttype: application/json

Got Response Data: &{Sequence:0 Message:test done!}
----------------------------

And the EL log after that:

{"level":"debug","ts":"2023-08-18T15:02:21.767Z","caller":"logging/config.go:116","msg":"Successfully created the logger."}
{"level":"debug","ts":"2023-08-18T15:02:21.768Z","caller":"logging/config.go:117","msg":"Logging level set to: debug"}
{"level":"debug","ts":"2023-08-18T15:02:21.769Z","logger":"eventlistener","caller":"metrics/metrics_worker.go:76","msg":"Flushing the existing exporter before setting up the new exporter.","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T15:02:21.769Z","logger":"eventlistener","caller":"metrics/prometheus_exporter.go:51","msg":"Created Prometheus exporter with config: &{tekton.dev/triggers eventlistener prometheus 5000000000 <nil>  false 9000 0.0.0.0}. Start the server for Prometheus exporter.","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T15:02:21.769Z","logger":"eventlistener","caller":"metrics/metrics_worker.go:91","msg":"Successfully updated the metrics exporter; old config: <nil>; new config &{tekton.dev/triggers eventlistener prometheus 5000000000 <nil>  false 9000 0.0.0.0}","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T15:02:21.770Z","logger":"eventlistener","caller":"adapter/adapter.go:88","msg":"Creating event broadcaster","commit":"2ec8bc6-dirty"}

Nothing there, although maybe that is expected.

I think this is what you asked for. If you need anything else please let me know.

khrm commented 1 year ago

@ggallen Actually, I wanted you to use CE-Test in place of EventLisener. Put CE-Test in broker in place of EventLisener and send event via broker.

I assume at the moment when we send event to Broker, it goes to EL. Now we want to send event to Broker, it should go to CE-Test.

ggallen commented 1 year ago

@khrm, I will try that. I'm afraid the problem is going to be that I can't get the log for CE-test, unless I get lucky, since it keeps restarting.

ggallen commented 1 year ago

@khrm, I believe this is what you are asking for.

CE-Test OUTPUT:

warning: GOPATH set to GOROOT (/usr/local/go) has no effect
Response:
Context Attributes,
  specversion: 1.0
  type: dev.tekton.event.triggers.accepted.v1
  source: agogos
  subject: gallen.agogos accepted b2e5d2cf-dc2b-45b0-b474-c51e08dc3c6e
  id: b2e5d2cf-dc2b-45b0-b474-c51e08dc3c6e
  datacontenttype: application/json
Data,
  {
    "eventListener": "agogos",
    "namespace": "gallen",
    "eventListenerUID": "bd026c66-0e12-48ee-8304-e8277a1c65fe",
    "eventID": "b2e5d2cf-dc2b-45b0-b474-c51e08dc3c6e"
  }

Got Event Response Context: Context Attributes,
  specversion: 1.0
  type: dev.tekton.event.triggers.accepted.v1
  source: agogos
  subject: gallen.agogos accepted b2e5d2cf-dc2b-45b0-b474-c51e08dc3c6e
  id: b2e5d2cf-dc2b-45b0-b474-c51e08dc3c6e
  datacontenttype: application/json

Got Response Data: &{Sequence:0 Message:}
----------------------------

CE-Test LOG:

{"level":"debug","ts":"2023-08-18T20:04:53.227Z","caller":"logging/config.go:116","msg":"Successfully created the logger."}
{"level":"debug","ts":"2023-08-18T20:04:53.227Z","caller":"logging/config.go:117","msg":"Logging level set to: debug"}
{"level":"debug","ts":"2023-08-18T20:04:53.229Z","logger":"eventlistener","caller":"metrics/metrics_worker.go:76","msg":"Flushing the existing exporter before setting up the new exporter.","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T20:04:53.230Z","logger":"eventlistener","caller":"metrics/prometheus_exporter.go:51","msg":"Created Prometheus exporter with config: &{tekton.dev/triggers eventlistener prometheus 5000000000 <nil>  false 9000 0.0.0.0}. Start the server for Prometheus exporter.","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T20:04:53.230Z","logger":"eventlistener","caller":"metrics/metrics_worker.go:91","msg":"Successfully updated the metrics exporter; old config: <nil>; new config &{tekton.dev/triggers eventlistener prometheus 5000000000 <nil>  false 9000 0.0.0.0}","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T20:04:53.231Z","logger":"eventlistener","caller":"adapter/adapter.go:88","msg":"Creating event broadcaster","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T20:16:28.882Z","logger":"eventlistener","caller":"sink/sink.go:177","msg":"handling event with path /, payload: {\"id\":0,\"message\":\"Hello world!\"} and header: map[Accept-Encoding:[gzip] Ce-Id:[233b6c63-0ddf-4fc9-b721-806f2911d4e4] Ce-Source:[https://github.com/cloudevents/sdk-go/v2/samples/requester] Ce-Specversion:[1.0] Ce-Time:[2023-08-18T20:16:28.87933523Z] Ce-Type:[com.cloudevents.sample.sent] Content-Length:[33] Content-Type:[application/json] User-Agent:[Go-http-client/1.1]]","commit":"2ec8bc6-dirty","eventlistener":"agogos","namespace":"gallen","/triggers-eventid":"b2e5d2cf-dc2b-45b0-b474-c51e08dc3c6e","eventlistenerUID":"bd026c66-0e12-48ee-8304-e8277a1c65fe","/triggers-eventid":"b2e5d2cf-dc2b-45b0-b474-c51e08dc3c6e"}
{"level":"debug","ts":"2023-08-18T20:16:28.882Z","logger":"eventlistener","caller":"sink/metrics.go:115","msg":"event listener request","commit":"2ec8bc6-dirty","status":"succeeded"}
{"level":"debug","ts":"2023-08-18T20:16:28.882Z","logger":"eventlistener","caller":"sink/metrics.go:100","msg":"event listener request completed","commit":"2ec8bc6-dirty","status":202,"duration":0.000258366}

BROKER OUTPUT:

warning: GOPATH set to GOROOT (/usr/local/go) has no effect
Response:
Context Attributes,
  specversion: 1.0
  type: dev.tekton.event.triggers.accepted.v1
  source: agogos
  subject: gallen.agogos accepted 017b097c-4f3c-4b26-b97a-4ee12d31bfc2
  id: 017b097c-4f3c-4b26-b97a-4ee12d31bfc2
  datacontenttype: application/json
Data,
  {
    "eventListener": "agogos",
    "namespace": "gallen",
    "eventListenerUID": "bd026c66-0e12-48ee-8304-e8277a1c65fe",
    "eventID": "017b097c-4f3c-4b26-b97a-4ee12d31bfc2"
  }

Got Event Response Context: Context Attributes,
  specversion: 1.0
  type: dev.tekton.event.triggers.accepted.v1
  source: agogos
  subject: gallen.agogos accepted 017b097c-4f3c-4b26-b97a-4ee12d31bfc2
  id: 017b097c-4f3c-4b26-b97a-4ee12d31bfc2
  datacontenttype: application/json

Got Response Data: &{Sequence:0 Message:}
----------------------------

BROKER LOG:

{"level":"debug","ts":"2023-08-18T20:18:50.629Z","caller":"logging/config.go:116","msg":"Successfully created the logger."}
{"level":"debug","ts":"2023-08-18T20:18:50.629Z","caller":"logging/config.go:117","msg":"Logging level set to: debug"}
{"level":"debug","ts":"2023-08-18T20:18:50.630Z","logger":"eventlistener","caller":"metrics/metrics_worker.go:76","msg":"Flushing the existing exporter before setting up the new exporter.","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T20:18:50.632Z","logger":"eventlistener","caller":"metrics/prometheus_exporter.go:51","msg":"Created Prometheus exporter with config: &{tekton.dev/triggers eventlistener prometheus 5000000000 <nil>  false 9000 0.0.0.0}. Start the server for Prometheus exporter.","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T20:18:50.632Z","logger":"eventlistener","caller":"metrics/metrics_worker.go:91","msg":"Successfully updated the metrics exporter; old config: <nil>; new config &{tekton.dev/triggers eventlistener prometheus 5000000000 <nil>  false 9000 0.0.0.0}","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T20:18:50.633Z","logger":"eventlistener","caller":"adapter/adapter.go:88","msg":"Creating event broadcaster","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T20:21:05.845Z","logger":"eventlistener","caller":"sink/sink.go:177","msg":"handling event with path /gallen/agogos, payload: {\"id\":0,\"message\":\"Hello world!\"} and header: map[Accept-Encoding:[gzip] Ce-Id:[9a3ea66f-d5f3-485d-8e9c-eed7e99a446b] Ce-Source:[https://github.com/cloudevents/sdk-go/v2/samples/requester] Ce-Specversion:[1.0] Ce-Time:[2023-08-18T20:21:05.842938993Z] Ce-Type:[com.cloudevents.sample.sent] Content-Length:[33] Content-Type:[application/json] User-Agent:[Go-http-client/1.1]]","commit":"2ec8bc6-dirty","eventlistener":"agogos","namespace":"gallen","/triggers-eventid":"017b097c-4f3c-4b26-b97a-4ee12d31bfc2","eventlistenerUID":"bd026c66-0e12-48ee-8304-e8277a1c65fe","/triggers-eventid":"017b097c-4f3c-4b26-b97a-4ee12d31bfc2"}
{"level":"debug","ts":"2023-08-18T20:21:05.845Z","logger":"eventlistener","caller":"sink/metrics.go:115","msg":"event listener request","commit":"2ec8bc6-dirty","status":"succeeded"}
{"level":"debug","ts":"2023-08-18T20:21:05.845Z","logger":"eventlistener","caller":"sink/metrics.go:100","msg":"event listener request completed","commit":"2ec8bc6-dirty","status":202,"duration":0.000326446}
khrm commented 1 year ago

No. I meant instead of broker sending request to EL, it should send request to CE-Test service. In the above, I can see log from EL.

ggallen commented 1 year ago

@khrm, yes. I replaced the regular EL with the CE-test image.

khrm commented 1 year ago

@ggallen But the logs that you shared are from EL.

Can you use the updated deployment file from here: https://raw.githubusercontent.com/khrm/ce-test/master/deployment.yaml

I removed probes and updated the image with one with logs. Now, you it shouldn't crash.

ggallen commented 1 year ago

@khrm, but how is the broker supposed to be configured to send events to ce-test? With the EL, it subscribes to the broker. I don't see anything similar for ce-test.

khrm commented 1 year ago

@ggallen How did you subscribe EL to Broker?

ggallen commented 1 year ago

@khrm, it's subscribed via a knative trigger:

- apiVersion: eventing.knative.dev/v1
  kind: Trigger
  metadata:
    annotations:
      eventing.knative.dev/creator: minikube-user
      eventing.knative.dev/lastModifier: minikube-user
    creationTimestamp: "2023-08-21T19:41:51Z"
    generation: 1
    labels:
      eventing.knative.dev/broker: agogos
    name: agogos
    namespace: gallen
    resourceVersion: "1597"
    uid: bd5b59d5-0470-4013-9269-74ebef5d5cd8
  spec:
    broker: agogos
    filter: {}
    subscriber:
      uri: http://el-agogos.gallen.svc.cluster.local:8080

I suppose I can adjust the trigger to point to ce-test.

khrm commented 1 year ago

Yes. Point it to CE-Test service.

ggallen commented 1 year ago

@khrm, here's the result of sending the event to the broker:

[gallen@localhost client] (master)$ kc logs ce-test-5b7b87bfdb-gvwbr
2023/08/22 13:00:41 Starting CE-Test
2023/08/22 13:00:41 listening on :8080/
Got Event Context: Context Attributes,
  specversion: 1.0
  type: com.cloudevents.sample.sent
  source: https://github.com/cloudevents/sdk-go/v2/samples/requester
  id: 34998d02-2ffe-4ad8-81b4-b7b53575e6f3
  time: 2023-08-22T13:06:36.961598534Z
  datacontenttype: application/json
Extensions,
  knativearrivaltime: 2023-08-22T13:06:36.96521175Z

2023/08/22 13:06:36 Handling EventID: 34998d02-2ffe-4ad8-81b4-b7b53575e6f3
Got Data: &{Sequence:0 Message:Hello world!}
----------------------------
2023/08/22 13:06:36 Sending event with ID f00c4892-b72c-43b5-ab00-1e8bbf6bd96e
[gallen@localhost client] (master)$ 
khrm commented 1 year ago

So CE-Test doesn't go into the loop? @ggallen

ggallen commented 1 year ago

@khrm, no, it does not.

khrm commented 1 year ago

OK. So issue seems to be with EL. Will come back to you later.

ggallen commented 1 year ago

@khrm, any update?

khrm commented 1 year ago

@ggallen I was busy with OpenShift Pipelines 1.12 work. I will update you on Wednesday.

ggallen commented 1 year ago

@khrm?

khrm commented 1 year ago

Sorry. I was busy OSP 1.12 release. Now I am looking into this.

khrm commented 1 year ago

@ggallen Can you try using mitmproxy to capture response and request for both scenarios?

ggallen commented 1 year ago

@khrm, why? Are you unable to reproduce this issue on your own with all the information I've already provided?

afrittoli commented 1 year ago

@ggallen @khrm I'm back from PTO and read through the thread. @khrm would it be possible to use unit tests to recreate the response of the EL and of ce-test and compare them to understand what's going on?

khrm commented 1 year ago

@afrittoli I would try to create that.

@ggallen I was busy with OSP 1.12 work. That's why I said. I would look into it this when free.

khrm commented 1 year ago

Proxy doesn't reports any significant difference between these two responses. First is by ce-test and then EL.

    Host: 127.0.0.1:8081
    User-Agent: Go-http-client/1.1
    Content-Length: 33
    Ce-Id: 3497e715-50cf-47e6-b672-0ea660bc485a
    Ce-Source: https://github.com/cloudevents/sdk-go/v2/samples/requester
    Ce-Specversion: 1.0
    Ce-Time: 2023-09-05T20:09:08.924977346Z
    Ce-Type: com.cloudevents.sample.sent
    Content-Type: application/json
    Accept-Encoding: gzip

    {
        "id": 0,
        "message": "Hello world!"
    }

 << 200 OK 31b
    Ce-Id: b9c47abc-38f6-4d35-8d1d-f6a7e3e3ce22
    Ce-Source: /test
    Ce-Specversion: 1.0
    Ce-Subject: https://github.com/cloudevents/sdk-go/v2/samples/requester#0
    Ce-Time: 2023-09-05T20:09:08.93055233Z
    Ce-Type: samples.http.test
    Content-Length: 31
    Content-Type: application/json
    Date: Tue, 05 Sep 2023 20:09:08 GMT

    {
        "id": 0,
        "message": "test done!"
    }

[01:39:09.436][[::1]:36502] client disconnect
[01:39:09.439][[::1]:36502] server disconnect 127.0.0.1:8081

[01:39:53.028][[::1]:33736] client connect
[01:39:53.030][[::1]:33736] server connect 127.0.0.1:8081
[::1]:33736: POST http://127.0.0.1:8081/
    Host: 127.0.0.1:8081
    User-Agent: Go-http-client/1.1
    Content-Length: 33
    Ce-Id: 48185ed2-3591-4432-805a-aae4da8198fb
    Ce-Source: https://github.com/cloudevents/sdk-go/v2/samples/requester
    Ce-Specversion: 1.0
    Ce-Time: 2023-09-05T20:09:53.027663988Z
    Ce-Type: com.cloudevents.sample.sent
    Content-Type: application/json
    Accept-Encoding: gzip

    {
        "id": 0,
        "message": "Hello world!"
    }

 << 202 Accepted 157b
    Ce-Id: a9e3ee86-5d60-4940-aab1-c98fd0435e3c
    Ce-Source: listener
    Ce-Specversion: 1.0
    Ce-Subject: default.listener accepted a9e3ee86-5d60-4940-aab1-c98fd0435e3c
    Ce-Type: dev.tekton.event.triggers.accepted.v1
    Content-Length: 157
    Content-Type: application/json
    Date: Tue, 05 Sep 2023 20:09:53 GMT

    {
        "eventID": "a9e3ee86-5d60-4940-aab1-c98fd0435e3c",
        "eventListener": "listener",
        "eventListenerUID": "99f270e1-8c7d-4224-b90d-4dba28617399",
        "namespace": "default"
    }
khrm commented 1 year ago

@ggallen Can you share your setup with me where you added ce-test? And then EL. I can debug then.

afrittoli commented 1 year ago

I wonder if the 202 response code is what confuses the broker. Perhaps we could try to have the test app reply with 202 and check if that triggers the loop?

ggallen commented 1 year ago

@ggallen Can you share your setup with me where you added ce-test? And then EL. I can debug then.

@khrm, that was over two weeks ago! I no longer have that setup.

khrm commented 1 year ago

@ggallen Can you try installing from this branch? https://github.com/tektoncd/triggers/tree/f-ce-200

Source base install method that I shared with you earlier.

Here I have only changed the HTTP response code which is the only change I could find between ce-test and EL.