tektoncd / triggers

Event triggering with Tekton!
Apache License 2.0
558 stars 420 forks source link

EventListener sometimes fails #1303

Closed badamowicz closed 2 years ago

badamowicz commented 2 years ago

Expected Behavior

Every time an EventListener receives a JSON payload by a webhook (e.g. from Bitbucket) it executes the associated Trigger.

Actual Behavior

Sometimes the EL will emit this error message:

{"level":"error","ts":"2022-01-31T12:10:52.436Z","logger":"eventlistener","caller":"sink/sink.go:278","msg":"couldn't create resource with group version kind \"tekton.dev/v1beta1, Resource=pipelineruns\": admission webhook \"webhook.pipeline.tekton.dev\" denied the request: mutation failed: cannot create patch for round tripped newBytes: cannot marshal interface: json: error calling MarshalJSON for type v1beta1.ArrayOrString: impossible ArrayOrString.Type: \"\"","knative.dev/controller":"eventlistener","eventlistener":"casa-push-listener","namespace":"sre-ci","eventlistenerUID":"bb40fa25-564d-43d1-92ba-db1668d37151","/triggers-eventid":"c30233b8-99bc-4e2e-adb2-79ca8fdb4b35","/trigger":"casa-ci-push-trigger","stacktrace":"github.com/tektoncd/triggers/pkg/sink.Sink.processTrigger\n\tgithub.com/tektoncd/triggers/pkg/sink/sink.go:278\ngithub.com/tektoncd/triggers/pkg/sink.Sink.HandleEvent.func4\n\tgithub.com/tektoncd/triggers/pkg/sink/sink.go:164"}

Steps to Reproduce the Problem

As mentioned it is unfortunately not possible to create a reproducible scenario. It happens in about 5% of the EL calls.

Additional Info

$ tkn version
Client version: 0.21.0
Pipeline version: v0.26.0

Triggers version is v0.14.2. (For whatever reason not shown in the output.)

On Openshift:

$ tkn version
Client version: 0.21.0
Pipeline version: v0.24.3
Triggers version: v0.14.2

Important hint

It is important to note that the problem occurred after we introduced FluxCD for managing all our resources inside thee clusters. So maybe this is not a Trigger bug at all and instead some other weird things are going on. So this issue is more about asking the question: Are there any ideas / suggestions of how we could dig deeper into this issue to get more information? Thanks in advance!

dibyom commented 2 years ago

hey @badamowicz sorry for the delayed response. Looking at the error message error calling MarshalJSON for type v1beta1.ArrayOrString: impossible ArrayOrString.Type: , I think what is happening is that Triggers is creating a pipelinerun but the pipeline webhook is rejecting it.

The error message comes from https://github.com/tektoncd/pipeline/blob/main/pkg/apis/pipeline/v1beta1/param_types.go#L131

I'd look in the TriggerTemplate and see if there is a place where you are specifying a param whose type is not string or array for some reason (maybe you are specifying an empty string or null or something?) It does seem weird though that this is happening for only a small percentage of calls.

tekton-robot commented 2 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale with a justification. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot commented 2 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten with a justification. Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

tekton-robot commented 2 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen with a justification. Mark the issue as fresh with /remove-lifecycle rotten with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

tekton-robot commented 2 years ago

@tekton-robot: Closing this issue.

In response to [this](https://github.com/tektoncd/triggers/issues/1303#issuecomment-1235947310): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen` with a justification. >Mark the issue as fresh with `/remove-lifecycle rotten` with a justification. >If this issue should be exempted, mark the issue as frozen with `/lifecycle frozen` with a justification. > >/close > >Send feedback to [tektoncd/plumbing](https://github.com/tektoncd/plumbing). Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.