jenkins-x / lighthouse

Apache License 2.0
184 stars 114 forks source link

openshift 4.7 Failed to create pipeline run: admission webhook #1376

Open matiasgonzalocalvo opened 3 years ago

matiasgonzalocalvo commented 3 years ago

hi. i execute this command

jx application delete -r spring-boot-openshift-rciots

and delete application

this command create pullrequest in jenkins x repo and lightouse job. but job stuff pending

17:26:34 opened User repo jx  8 presubmit verify pending 17:26:34  

in the lighthouse-tekton-controller logs i see this error

{"component":"lighthouse-tekton-controller","controller":"tekton-controller","file":"/workspace/source/pkg/engines/tekton/controller.go:157","func":"github.com/jenkins-x/lighthouse/pkg/engines/tekton.(*LighthouseJobReconciler).Reconcile","level":"error","msg":"Failed to create pipeline run: admission webhook \"webhook.pipeline.tekton.dev\" denied the request: mutation failed: cannot decode incoming new object: json: unknown field \"spec\"","time":"2021-09-11T17:26:34Z"}

nikki-quant commented 2 years ago

Hi - after upgrading from Lighthouse chart version lighthouse-1.1.26 to 1.1.51 I'm also seeing this unknown field: "spec" message:

{"component":"lighthouse-tekton-controller","controller":"tekton-controller","file":"/workspace/source/pkg/engines/tekton/controller.go:157","func":"github.com/jenkins-x/lighthouse/pkg/engines/tekton.(*LighthouseJobReconciler).Reconcile","level":"error","msg":"Failed to create pipeline run: admission webhook \"webhook.pipeline.tekton.dev\" denied the request: mutation failed: cannot decode incoming new object: json: unknown field \"spec\"","time":"2021-11-30T09:09:00Z"}

The mention of decoding an incoming object makes it sound like the Github Webhook payload contains bad JSON, but there isn't a field called spec in that as best I can make out (https://docs.github.com/en/developers/webhooks-and-events/webhooks/webhook-events-and-payloads#pull_request). Are there changes in the pipeline structure/format?

In my case the verify job doesn't show up in Lighthouse or on the PR - Tekton has errored before receiving the job.

The image is ghcr.io/jenkins-x/lighthouse-tekton-controller:1.1.51

The environment settings are:

    Environment:
      LOGRUS_FORMAT:                                            json
      LOGRUS_SERVICE:                                           lighthouse
      LOGRUS_SERVICE_VERSION:                                   1.1.51
      LOGRUS_STACK_SKIP:                                        
      DEFAULT_PIPELINE_RUN_SERVICE_ACCOUNT:                     tekton-bot
      DEFAULT_PIPELINE_RUN_TIMEOUT:                             2h0m0s
      FILE_BROWSER:                                             git
      JX_DEFAULT_IMAGE:                                         ghcr.io/jenkins-x/builder-maven:2.1.149-768
      LIGHTHOUSE_DASHBOARD_TEMPLATE:                            namespaces/{{ .Namespace }}/pipelineruns/{{ .PipelineRun }}
      LIGHTHOUSE_VERSIONSTREAM_JENKINS_X_JX3_PIPELINE_CATALOG:  78ee09edfc1815a0057f1df4a3749b0dba55c117

.lighthouse/jenkins-x/pullrequest.yaml is as follows:

apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
  creationTimestamp: null
  name: pullrequest
spec:
  pipelineSpec:
    tasks:
    - name: from-build-pack
      resources: {}
      taskSpec:
        metadata: {}
        stepTemplate:
          image: uses:jenkins-x/jx3-pipeline-catalog/tasks/environment/pullrequest.yaml@versionStream
          name: ""
          resources:
            requests:
              cpu: 0.1
              memory: 128Mi
            limits:
              cpu: 400m
              memory: 512Mi
          workingDir: /workspace/source
        steps:
        - image: uses:jenkins-x/jx3-pipeline-catalog/tasks/git-clone/git-clone-env-pr.yaml@versionStream
          resources: {}
        - name: make-pr
          resources: {}
  podTemplate: {}
  serviceAccountName: tekton-bot
  timeout: 12h0m0s
status: {}

If anyone has thoughts on where to start debugging this it'd be much appreciated!

nikki-quant commented 2 years ago

In our case the culprit was that our Tekton Controller (the backend we are using to run jobs initiated by Lighthouse) was not running cleanly. We are on Kubernetes 1.18 on this environment, and needed to set a KUBERNETES_MIN_VERSION environment variable to permit the service to run on 1.18 (which is not EOL'd yet, but will be in February).

It is relatively hard to track down the underlying issue this from the message above, but we could see the PR jobs being created using kubectl get LighthouseJobs -n jx , which showed the job was in pending status. The fact it was being created by Lighthouse in response to the webhook but not getting run indicated it was an issue with Tekton.

On the unlikely event someone else has run into this niche issue there was a brief discussion in the Kubernetes slack usergroup here: https://kubernetes.slack.com/archives/C9MBGQJRH/p1638264995318800