tektoncd / dashboard

A dashboard for Tekton!
Apache License 2.0
873 stars 265 forks source link

Pipeline run build failures shows build stage as 'Not run' #1204

Closed hanczaryk closed 4 years ago

hanczaryk commented 4 years ago

Expected behavior

I expect that the build stage would report 'Failed' upon my expected failure while attempting to run java-openliberty-build-deploy-pl where I'm expecting a failure due to invalid step in build.

Actual behavior

I observe that the build stage shows 'Not Run'

image

Steps to reproduce the problem

  1. Edit the java-openliberty's Dockerfile to add a faulty step that will fail
  2. Rebuild stack-hub
  3. Attempt to deploy application using java-openliberty-build-deploy-pl

Environment

Additional Info

AlanGreene commented 4 years ago

In your screenshot I can see that it's showing all steps under 'build-push-task' as Not Run. This can happen for a number of reasons, most commonly if there's an error with a PipelineResource init step. We've recently made a change to the Dashboard to ensure that information is surfaced and it will be included in our next release.

Can you share an example of a Pipeline + PipelineRun that reproduces this problem? What Tekton Pipelines version are you running this on?

If you hover over the status message in the header you should see a tooltip that may include more information image

Alternatively, if you navigate to the TaskRuns page and view the corresponding TaskRun details it may also include more information if there was an error initializing pods for example. We're working on surfacing this information in the PipelineRun too.

hanczaryk commented 4 years ago

Sorry if the screenshot didn't accurately reflect that build was run. I tried to show the status tab. Here are the last few lines of the 'Logs' tab showing that 21 previous steps were completed successfully before encountering the user caused failure. Tekton pipelines version is v0.10.1.

[Buildah] STEP 22: RUN causeFailure.sh [Buildah] /bin/sh: 1: causeFailure.sh: not found [Buildah] subprocess exited with status 127 [Buildah] subprocess exited with status 127 [Buildah] error building at STEP "RUN causeFailure.sh": exit status 127 [Error] exit status 1 Copying the generated app-deploy.yaml file from input to the output to pass the file to the next task when this task is used in deploy pipeline

Here is the hover message from the TaskRun failure

java-openliberty-build-deploy-pl-run-7pn6z-build-push-tas-7wdn5

Failed "step-image-digest-exporter-mj7ch" exited with code 1 (image: "quay.io/openshift-pipeline/tektoncd-pipeline-imagedigestexporter@sha256:bc12f889c9f28f7f7efeb9854df0e390869fdf1b6505bea31a4c17b3014becd3"); for logs run: kubectl -n kabanero logs java-openliberty-build-deploy-pl-run-7pn6z-build-push-tas-q5ljt -c step-image-digest-exporter-mj7ch

AlanGreene commented 4 years ago

hrmm ok so it looks like there are a few strange things happening here:

If you inspect the TaskRun using kubectl describe taskrun <name> what does it report for the status (including steps)?

Would it be possible to provide a reduced example that reproduces this behaviour?

hanczaryk commented 4 years ago

Here is the resulting oc describe taskrun java-openliberty-build-deploy-pl-run-7pn6z-build-push-tas-7wdn5. I don't believe that I have the tekton knowledge to provide a reduced example to reproduce the behavior. I'm using the tekton that is installed with IBM Cloud Pak for Applications 4.1.0 on OpenShift Container Platform 4.3.

Name: java-openliberty-build-deploy-pl-run-7pn6z-build-push-tas-7wdn5 Namespace: kabanero Labels: app.kubernetes.io/managed-by=tekton-pipelines tekton.dev/eventlistener=tekton-webhooks-eventlistener tekton.dev/pipeline=java-openliberty-build-deploy-pl tekton.dev/pipelineRun=java-openliberty-build-deploy-pl-run-7pn6z tekton.dev/pipelineTask=build-push-task tekton.dev/task=java-openliberty-build-push-task tekton.dev/trigger=stackmwa-webhook-kabanero-push-event tekton.dev/triggers-eventid=9z2zs ... Annotations: manifestival: new tekton.dev/release: devel API Version: tekton.dev/v1alpha1 Kind: TaskRun Metadata: Creation Timestamp: 2020-03-27T16:04:15Z Generation: 1 Owner References: API Version: tekton.dev/v1alpha1 Block Owner Deletion: true Controller: true Kind: PipelineRun Name: java-openliberty-build-deploy-pl-run-7pn6z UID: a43cbd4e-a8fd-4fa4-af48-d1addc04a879 Resource Version: 9396947 Self Link: /apis/tekton.dev/v1alpha1/namespaces/kabanero/taskruns/java-openliberty-build-deploy-pl-run-7pn6z-build-push-tas-7wdn5 UID: 752255b1-e46c-4121-8a19-4ad231245b65 Spec: Inputs: Resources: Name: git-source Resource Ref: Name: git-source-ds2j2 Outputs: Resources: Name: docker-image Paths: /pvc/build-push-task/docker-image Resource Ref: Name: docker-image-ds2j2 Name: git-source Paths: /pvc/build-push-task/git-source Resource Ref: Name: git-source-ds2j2 Service Account Name: kabanero-pipeline Task Ref: Name: java-openliberty-build-push-task Timeout: 1h0m0s Status: Completion Time: 2020-03-27T16:08:36Z Conditions: Last Transition Time: 2020-03-27T16:08:36Z Message: "step-image-digest-exporter-mj7ch" exited with code 1 (image: "quay.io/openshift-pipeline/tektoncd-pipeline-imagedigestexporter@sha256:bc12f889c9f28f7f7efeb9854df0e390869fdf1b6505bea31a4c17b3014becd3"); for logs run: kubectl -n kabanero logs java-openliberty-build-deploy-pl-run-7pn6z-build-push-tas-q5ljt -c step-image-digest-exporter-mj7ch Reason: Failed Status: False Type: Succeeded Pod Name: java-openliberty-build-deploy-pl-run-7pn6z-build-push-tas-q5ljt Start Time: 2020-03-27T16:04:15Z Steps: Container: step-create-dir-docker-image-fc6ll Image ID: registry.access.redhat.com/ubi8/ubi-minimal@sha256:01b8fb7b3ad16a575651a4e007e8f4d95b68f727b3a41fc57996be9a790dc4fa Name: create-dir-docker-image-fc6ll Terminated: Container ID: cri-o://dac141202c0f7636a594f24cd5cd3d50e41bd22d00debe5770b52ae1f3ecb6d4 Exit Code: 0 Finished At: 2020-03-27T16:04:41Z Reason: Completed Started At: 2020-03-27T16:04:41Z Container: step-build Image ID: docker.io/appsody/appsody-buildah@sha256:c31db8290a6b3e105058bbfd6aa48eff365a00dc16e5ba41f26e24964b3a3446 Name: build Terminated: Container ID: cri-o://423be086ac0e5f20b48a05b7f4b36e3dfb990851dbf73a8046a3a6ca0eda150a Exit Code: 0 Finished At: 2020-03-27T16:08:33Z Reason: Completed Started At: 2020-03-27T16:04:44Z Container: step-create-dir-git-source-975xb Image ID: registry.access.redhat.com/ubi8/ubi-minimal@sha256:01b8fb7b3ad16a575651a4e007e8f4d95b68f727b3a41fc57996be9a790dc4fa Name: create-dir-git-source-975xb Terminated: Container ID: cri-o://958fe2cb4ba1806caa326f57a3e57f01f051cd9ff005c19b1e4530fa82e0dce7 Exit Code: 0 Finished At: 2020-03-27T16:04:41Z Reason: Completed Started At: 2020-03-27T16:04:41Z Container: step-git-source-git-source-ds2j2-vcccz Image ID: quay.io/openshift-pipeline/tektoncd-pipeline-git-init@sha256:0956ae04297fe4740af495ec1d6d51bd7fbd79686f5d5a4ea09ca44c4e9838cf Name: git-source-git-source-ds2j2-vcccz Terminated: Container ID: cri-o://9c5a35c1f55f71679c05ba341038875eb98c9b18d7b9c973e9268a180f78d6af Exit Code: 0 Finished At: 2020-03-27T16:04:44Z Message: [{"name":"","digest":"","key":"commit","value":"c6163997dc71e8ffcda08d28a5c1c280f8ed6699","resourceRef":{}}] Reason: Completed Started At: 2020-03-27T16:04:42Z Events:

AlanGreene commented 4 years ago

The change in https://github.com/tektoncd/dashboard/pull/1184 is related, as it should now surface the error from the PipelineResource init step step-image-digest-exporter-mj7ch in the list of steps in the PipelineRun view, making it clearer what's failed.

I'll try to reproduce this error with the image-digest-exporter step and see how it handles the remaining steps. We may still need some additional changes for this 'not run' state where they bailed out due to some PipelineResource related issue.

ncskier commented 4 years ago

@hanczar let me do some debugging on the cluster, and I think I found out why the steps say 'Not run' 👍

The imagename-lowercase step is not listed in the PipelineRun/TaskRun’s status section... it should be there since the Task was executed (as we can see from the logs), but it isn't. The Dashboard still displays the imagename-lowercase step, because the Dashboard gets the list of steps from the Task spec. The Dashboard usually gets each step's status and reason fields from the PipelineRun/TaskRun's status section, but since the imagename-lowercase step is not in the PipelineRun/TaskRun's status section, the Dashboard records the step as having an undefined status and reason. The Dashboard logic interprets this undefined status and reason as the step being ‘Not run’ and as a failure. So every step after the imagename-lowercase also says ‘Not run`...

Screen Shot 2020-03-31 at 12 31 10 PM

This screenshot shows:

the Dashboard records the step as having an undefined status and reason

ncskier commented 4 years ago

I created an issue in the Pipelines repo to track this: https://github.com/tektoncd/pipeline/issues/2323

a-roberts commented 4 years ago

@ncskier is this fixed now? I noticed https://github.com/tektoncd/pipeline/issues/2323 is now closed 👀

ncskier commented 4 years ago

Looks like it might be fixed now... although I wasn't able to reproduce the error on my system, so I can't verify that it was fixed.

@hanczaryk would probably have to try to verify, but he was using the OpenShift Operator (which does not have the fix yet).

AlanGreene commented 4 years ago

Closing, please reopen if you're still seeing issues on the latest release.