tektoncd / pipeline

A cloud-native Pipeline resource.
https://tekton.dev
Apache License 2.0
8.46k stars 1.77k forks source link

unclear why pipeline task is stuck Pending / tekton entrypoint behaving different #3898

Closed 06kellyjac closed 2 years ago

06kellyjac commented 3 years ago

Expected Behavior

Expected task to run, fail, or be refused as invalid when added originally

While writing this report it finally failed

Some detail as to why this happened and maybe where to look to debug this would be great. I tried looking for the pod, describing the pipelinerun, I had a little look at the tekton-pipeline-controler's logs but gave up quite quick as they're pretty hard to read in the terminal

Actual Behavior

Pipeline sat with an incomplete pending task, no pod for the task, pipeline run also doesn't show a pod for ages

Steps to Reproduce the Problem

  1. Create KIND cluster
  2. Deploy operator (I was using 0.22.0-1)
  3. Deploy config all
  4. Apply task below + tkn hub install -> tkn hub install task git-clone golang-build golang-test
  5. Apply pipeline below (modify FILLTHISIN parts)
  6. Apply PVC below
  7. Run pipeline using tkn

Additional Info

λ kubectl version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.5", GitCommit:"6b1d87acf3c8253c123756b9e61dac642678305f", GitTreeState:"archive", BuildDate:"1980-01-01T00:00:00Z", GoVersion:"go1.16.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2", GitCommit:"faecb196815e248d3ecfb03c680a4507229c2a56", GitTreeState:"clean", BuildDate:"2021-01-21T01:11:42Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
λ tkn version
Client version: 0.17.2
Pipeline version: v0.21.0
Triggers version: v0.11.2
Dashboard version: v0.14.0

From within the taskrun section of the pipeline run for ages:

    sample-pipeline-run-z48h8-hello-dt4v9:
      pipelineTaskName: hello
      status:
        podName: ""

Then finally updated to:

    sample-pipeline-run-z48h8-hello-dt4v9:
      pipelineTaskName: hello
      status:
        completionTime: "2021-04-27T15:52:31Z"
        conditions:
        - lastTransitionTime: "2021-04-27T15:52:31Z"
          message: |
            "step-hello" exited with code 2 (image: "docker.io/library/busybox@sha256:ae39a6f5c07297d7ab64dbd4f82c77c874cc6a94cea29fdec309d0992574b4f7"); for logs run: kubectl -n default logs sample-pipeline-run-z48h8-hello-dt4v9-pod-zqqqn -c step-hello
          reason: Failed
          status: "False"
          type: Succeeded
        podName: sample-pipeline-run-z48h8-hello-dt4v9-pod-zqqqn
        startTime: "2021-04-27T15:34:55Z"
        steps:
        - container: step-hello
          imageID: docker.io/library/busybox@sha256:ae39a6f5c07297d7ab64dbd4f82c77c874cc6a94cea29fdec309d0992574b4f7
          name: hello
          terminated:
            containerID: containerd://9073b8fea524187e38007c865d4f3bc09a8690b434a4f3d81af87add9760c3bb
            exitCode: 2
            finishedAt: "2021-04-27T15:52:30Z"
            reason: Error
            startedAt: "2021-04-27T15:52:30Z"
        taskSpec:
          description: hello world
          steps:
          - args:
            - echo
            - hi
            image: busybox
            name: hello
            resources: {}

PipelineRun's TaskRun:

Name Task Created Duration
sample-pipeline-run-z48h8-hello-dt4v9 hello-world 19 minutes ago 17 minutes 36 seconds
hello-world-run-rhtmt hello-world 16 minutes ago 14 minutes 34 seconds

Task:

---
apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: hello-world
  labels:
    app.kubernetes.io/version: "0.2"
  annotations:
    tekton.dev/pipelines.minVersion: "0.17.0"
    tekton.dev/tags: cli
    tekton.dev/displayName: "Hello World"
spec:
  description: >-
    hello world
  steps:
    - name: hello
      image: busybox
      args:
        - echo
        - hi

Equivelent job that runs fine

apiVersion: batch/v1
kind: Job
metadata:
  name: hello
spec:
  template:
    spec:
      containers:
        - name: hello
          image: busybox
          args: ["echo", "hello"]
      restartPolicy: Never
λ kubectl logs hello-r99h9
hello

docker run busybox echo hi runs fine

Pipeline:

---
apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
  name: sample-pipeline
spec:
  workspaces:
    - name: source
      description: The workspace containing the source code which is to be build, pushed and deployed
  tasks:
    - name: clone
      taskRef:
        name: git-clone
      workspaces:
        - name: output
          workspace: source
      params:
        - name: url
          value: FILLTHISIN
        - name: deleteExisting
          value: "true"
    - name: build
      taskRef:
        name: golang-build
      workspaces:
        - name: source
          workspace: source
      params:
        - name: package
          value: FILLTHISIN
      runAfter:
        - clone
    - name: test
      taskRef:
        name: golang-test
      workspaces:
        - name: source
          workspace: source
      params:
        - name: package
          value: FILLTHISIN
      runAfter:
        - build
    - name: hello
      taskRef:
        name: hello-world
      runAfter:
        - build

pvc:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pipeline-source-pvc
spec:
  storageClassName: standard
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
tibcoplord commented 3 years ago

I see the same with https://github.com/tektoncd/catalog/tree/main/task/kaniko/0.3 but not with https://github.com/tektoncd/catalog/tree/main/task/kaniko/0.1

I'm wondering if its related to workspaces somehow

ghost commented 3 years ago

Difficult to know precisely what's delayed the TaskRun here but it could be worth watching the PVC's and PV's Status as the PipelineRun executes. My first guess would have been either the volume is taking a looooong time to initialize or a Step's image is being downloaded extremely slowly. However in both of those scenarios the Pod is typically created just fine I think, but it won't be allowed to move into a Running state until those processes complete.

The Pod seems to have been created (judging from the PipelineRun's status) since it has an exit code for one of the steps. When I'm debugging I'll sometimes set up two windows side-by-side, one with kubectl get --watch taskruns and another with kubectl get --watch pods to see the two resources being created. If it's possible to get the description of the pod (kubectl describe pod sample-pipeline-run-z48h8-hello-dt4v9-pod-zqqqn) I think that might help a lot - hopefully we'll see Events on it that pinpoint the problem.

06kellyjac commented 3 years ago

The pod wasn't created for over 15 minutes. I did kubectl get po numerous times and it only showed the completed ones from clone, build, and test, no pending or failed pods

The image was busybox so it wouldn't have taken long to pull and as you mentioned it should have been visible that it was having issues pulling the image in an actual pod resource, when no such resource existed.

As for volumes there's no workspace or volumes in the hello task, not even an emptyDir. The ones using the workspace worked fine

If there was a pod resource I'd have something to debug but for over 15 mins there was no pod resource at all but the pipeline run was listed as pending.

I'll try recreate the issue and get you a kubectl get po output etc

06kellyjac commented 3 years ago

get po watch:

λ kubectl get po -w
NAME                              READY   STATUS    RESTARTS   AGE
affinity-assistant-1bbd63b162-0   0/1     Pending   0          0s
sample-pipeline-run-bctlv-clone-khmfg-pod-4sq7k   0/1     Pending   0          0s
affinity-assistant-1bbd63b162-0                   0/1     Pending   0          4s
affinity-assistant-1bbd63b162-0                   0/1     ContainerCreating   0          4s
sample-pipeline-run-bctlv-clone-khmfg-pod-4sq7k   0/1     Pending             0          4s
sample-pipeline-run-bctlv-clone-khmfg-pod-4sq7k   0/1     Init:0/2            0          4s
sample-pipeline-run-bctlv-clone-khmfg-pod-4sq7k   0/1     Init:1/2            0          8s
affinity-assistant-1bbd63b162-0                   1/1     Running             0          10s
sample-pipeline-run-bctlv-clone-khmfg-pod-4sq7k   0/1     PodInitializing     0          14s
sample-pipeline-run-bctlv-clone-khmfg-pod-4sq7k   1/1     Running             0          20s
sample-pipeline-run-bctlv-clone-khmfg-pod-4sq7k   1/1     Running             0          20s
sample-pipeline-run-bctlv-clone-khmfg-pod-4sq7k   0/1     Completed           0          23s
sample-pipeline-run-bctlv-build-fnl8d-pod-h46x5   0/1     Pending             0          0s
sample-pipeline-run-bctlv-build-fnl8d-pod-h46x5   0/1     Pending             0          0s
sample-pipeline-run-bctlv-build-fnl8d-pod-h46x5   0/1     Init:0/2            0          0s
sample-pipeline-run-bctlv-build-fnl8d-pod-h46x5   0/1     Init:1/2            0          1s
sample-pipeline-run-bctlv-build-fnl8d-pod-h46x5   0/1     PodInitializing     0          2s
sample-pipeline-run-bctlv-build-fnl8d-pod-h46x5   1/1     Running             0          45s
sample-pipeline-run-bctlv-build-fnl8d-pod-h46x5   1/1     Running             0          45s
sample-pipeline-run-bctlv-build-fnl8d-pod-h46x5   0/1     Completed           0          55s
sample-pipeline-run-bctlv-test-mpgfc-pod-mdxfn    0/1     Pending             0          0s
sample-pipeline-run-bctlv-test-mpgfc-pod-mdxfn    0/1     Pending             0          0s
sample-pipeline-run-bctlv-test-mpgfc-pod-mdxfn    0/1     Init:0/2            0          0s
sample-pipeline-run-bctlv-test-mpgfc-pod-mdxfn    0/1     Init:1/2            0          0s
sample-pipeline-run-bctlv-test-mpgfc-pod-mdxfn    0/1     PodInitializing     0          1s
sample-pipeline-run-bctlv-test-mpgfc-pod-mdxfn    1/1     Running             0          3s
sample-pipeline-run-bctlv-test-mpgfc-pod-mdxfn    1/1     Running             0          3s
sample-pipeline-run-bctlv-test-mpgfc-pod-mdxfn    0/1     Completed           0          17s

get pvc:

λ kubectl get pvc
NAME                  STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
pipeline-source-pvc   Bound    pvc-8b177516-1ef2-44e5-8a1e-af0def2c5dbb   1Gi        RWO            standard       5m9s

kubectl get po over time:

λ kubectl get po
NAME                                              READY   STATUS      RESTARTS   AGE
affinity-assistant-1bbd63b162-0                   1/1     Running     0          2m14s
sample-pipeline-run-bctlv-build-fnl8d-pod-h46x5   0/1     Completed   0          111s
sample-pipeline-run-bctlv-clone-khmfg-pod-4sq7k   0/1     Completed   0          2m14s
sample-pipeline-run-bctlv-test-mpgfc-pod-mdxfn    0/1     Completed   0          55s
λ date
Thu 29 Apr 12:38:17 BST 2021
λ date
Thu 29 Apr 12:41:59 BST 2021
λ kubectl get po
NAME                                              READY   STATUS      RESTARTS   AGE
affinity-assistant-1bbd63b162-0                   1/1     Running     0          6m42s
sample-pipeline-run-bctlv-build-fnl8d-pod-h46x5   0/1     Completed   0          6m19s
sample-pipeline-run-bctlv-clone-khmfg-pod-4sq7k   0/1     Completed   0          6m42s
sample-pipeline-run-bctlv-test-mpgfc-pod-mdxfn    0/1     Completed   0          5m23s
λ date
Thu 29 Apr 12:48:08 BST 2021
λ kubectl get po
NAME                                              READY   STATUS      RESTARTS   AGE
affinity-assistant-1bbd63b162-0                   1/1     Running     0          12m
sample-pipeline-run-bctlv-build-fnl8d-pod-h46x5   0/1     Completed   0          12m
sample-pipeline-run-bctlv-clone-khmfg-pod-4sq7k   0/1     Completed   0          12m
sample-pipeline-run-bctlv-test-mpgfc-pod-mdxfn    0/1     Completed   0          11m
λ date
Thu 29 Apr 12:55:08 BST 2021
λ kubectl get po
NAME                                              READY   STATUS      RESTARTS   AGE
sample-pipeline-run-bctlv-build-fnl8d-pod-h46x5   0/1     Completed   0          19m
sample-pipeline-run-bctlv-clone-khmfg-pod-4sq7k   0/1     Completed   0          19m
sample-pipeline-run-bctlv-hello-gpvd9-pod-5rlvf   0/1     Error       0          61s
sample-pipeline-run-bctlv-test-mpgfc-pod-mdxfn    0/1     Completed   0          18m

describe taskrun:

λ kubectl get taskrun
NAME                                    SUCCEEDED   REASON      STARTTIME   COMPLETIONTIME
sample-pipeline-run-bctlv-build-fnl8d   True        Succeeded   14m         13m
sample-pipeline-run-bctlv-clone-khmfg   True        Succeeded   14m         14m
sample-pipeline-run-bctlv-hello-gpvd9
sample-pipeline-run-bctlv-test-mpgfc    True        Succeeded   13m         13m
λ kubectl describe taskrun sample-pipeline-run-bctlv-hello-gpvd9
Name:         sample-pipeline-run-bctlv-hello-gpvd9
Namespace:    default
Labels:       app.kubernetes.io/managed-by=tekton-pipelines
              tekton.dev/pipeline=sample-pipeline
              tekton.dev/pipelineRun=sample-pipeline-run-bctlv
              tekton.dev/pipelineTask=hello
Annotations:  <none>
API Version:  tekton.dev/v1beta1
Kind:         TaskRun
Metadata:
  Creation Timestamp:  2021-04-29T11:36:41Z
  Generation:          1
  Managed Fields:
    API Version:  tekton.dev/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
        f:labels:
          .:
          f:tekton.dev/pipeline:
          f:tekton.dev/pipelineRun:
          f:tekton.dev/pipelineTask:
        f:ownerReferences:
          .:
          k:{"uid":"b592f4ca-855c-4401-a599-ef6319fac217"}:
            .:
            f:apiVersion:
            f:blockOwnerDeletion:
            f:controller:
            f:kind:
            f:name:
            f:uid:
      f:spec:
        .:
        f:resources:
        f:serviceAccountName:
        f:taskRef:
          .:
          f:kind:
          f:name:
        f:timeout:
      f:status:
        .:
        f:podName:
    Manager:    controller
    Operation:  Update
    Time:       2021-04-29T11:36:41Z
  Owner References:
    API Version:           tekton.dev/v1beta1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  PipelineRun
    Name:                  sample-pipeline-run-bctlv
    UID:                   b592f4ca-855c-4401-a599-ef6319fac217
  Resource Version:        2936
  UID:                     ea2a3368-7a08-43e8-8137-a9ea3af0808f
Spec:
  Resources:
  Service Account Name:  default
  Task Ref:
    Kind:   Task
    Name:   hello-world
  Timeout:  1h0m0s
Events:
  Type    Reason   Age   From     Message
  ----    ------   ----  ----     -------
  Normal  Started  12m   TaskRun  

# later
λ kubectl describe taskrun sample-pipeline-run-bctlv-hello-gpvd9
Name:         sample-pipeline-run-bctlv-hello-gpvd9
Namespace:    default
Labels:       app.kubernetes.io/managed-by=tekton-pipelines
              app.kubernetes.io/version=0.2
              tekton.dev/pipeline=sample-pipeline
              tekton.dev/pipelineRun=sample-pipeline-run-bctlv
              tekton.dev/pipelineTask=hello
              tekton.dev/task=hello-world
Annotations:  pipeline.tekton.dev/release: v0.21.0
              tekton.dev/displayName: Hello World
              tekton.dev/pipelines.minVersion: 0.17.0
              tekton.dev/tags: cli
API Version:  tekton.dev/v1beta1
Kind:         TaskRun
Metadata:
  Creation Timestamp:  2021-04-29T11:36:41Z
  Generation:          1
  Managed Fields:
    API Version:  tekton.dev/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
          f:pipeline.tekton.dev/release:
          f:tekton.dev/displayName:
          f:tekton.dev/pipelines.minVersion:
          f:tekton.dev/tags:
        f:labels:
          .:
          f:app.kubernetes.io/version:
          f:tekton.dev/pipeline:
          f:tekton.dev/pipelineRun:
          f:tekton.dev/pipelineTask:
          f:tekton.dev/task:
        f:ownerReferences:
          .:
          k:{"uid":"b592f4ca-855c-4401-a599-ef6319fac217"}:
            .:
            f:apiVersion:
            f:blockOwnerDeletion:
            f:controller:
            f:kind:
            f:name:
            f:uid:
      f:spec:
        .:
        f:resources:
        f:serviceAccountName:
        f:taskRef:
          .:
          f:kind:
          f:name:
        f:timeout:
      f:status:
        .:
        f:completionTime:
        f:conditions:
        f:podName:
        f:startTime:
        f:steps:
        f:taskSpec:
          .:
          f:description:
          f:steps:
    Manager:    controller
    Operation:  Update
    Time:       2021-04-29T11:54:19Z
  Owner References:
    API Version:           tekton.dev/v1beta1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  PipelineRun
    Name:                  sample-pipeline-run-bctlv
    UID:                   b592f4ca-855c-4401-a599-ef6319fac217
  Resource Version:        14219
  UID:                     ea2a3368-7a08-43e8-8137-a9ea3af0808f
Spec:
  Resources:
  Service Account Name:  default
  Task Ref:
    Kind:   Task
    Name:   hello-world
  Timeout:  1h0m0s
Status:
  Completion Time:  2021-04-29T11:54:19Z
  Conditions:
    Last Transition Time:  2021-04-29T11:54:19Z
    Message:               "step-hello" exited with code 2 (image: "docker.io/library/busybox@sha256:1ccc0a0ca577e5fb5a0bdf2150a1a9f842f47c8865e861fa0062c5d343eb8cac"); for logs run: kubectl -n default logs sample-pipeline-run-bctlv-hello-gpvd9-pod-5rlvf -c step-hello

    Reason:    Failed
    Status:    False
    Type:      Succeeded
  Pod Name:    sample-pipeline-run-bctlv-hello-gpvd9-pod-5rlvf
  Start Time:  2021-04-29T11:36:41Z
  Steps:
    Container:  step-hello
    Image ID:   docker.io/library/busybox@sha256:1ccc0a0ca577e5fb5a0bdf2150a1a9f842f47c8865e861fa0062c5d343eb8cac
    Name:       hello
    Terminated:
      Container ID:  containerd://24c0c45241069660c2842368e91a6956ced343f6f55e8deddeca4b34a5143234
      Exit Code:     2
      Finished At:   2021-04-29T11:54:18Z
      Reason:        Error
      Started At:    2021-04-29T11:54:18Z
  Task Spec:
    Description:  hello world
    Steps:
      Args:
        echo
        hi
      Image:  busybox
      Name:   hello
      Resources:
Events:
  Type     Reason   Age                From     Message
  ----     ------   ----               ----     -------
  Normal   Started  19m                TaskRun  
  Normal   Pending  99s                TaskRun  Pending
  Normal   Pending  99s (x2 over 99s)  TaskRun  pod status "Initialized":"False"; message: "containers with incomplete status: [place-tools]"
  Normal   Pending  97s                TaskRun  pod status "Ready":"False"; message: "containers with unready status: [step-hello]"
  Normal   Running  93s                TaskRun  Not all Steps in the Task have finished executing
  Warning  Failed   91s                TaskRun  "step-hello" exited with code 2 (image: "docker.io/library/busybox@sha256:1ccc0a0ca577e5fb5a0bdf2150a1a9f842f47c8865e861fa0062c5d343eb8cac"); for logs run: kubectl -n default logs sample-pipeline-run-bctlv-hello-gpvd9-pod-5rlvf -c step-hello

Dashboard:

Screenshot_2021-04-29 Tekton Dashboard PipelineRun - sample-pipeline-run-bctlv(1)

ghost commented 3 years ago

Yeah, this does strike me as strange. I'm able to reproduce it running off the latest main commit too. What I'm seeing is that the reconciler schedules the hello TaskRun right after build finishes but after that no real updates. I've attached the raw log from the TaskRun creation below. Nothing strikes me as out of the ordinary.

{"level":"info","ts":"2021-04-29T12:54:23.014Z","logger":"tekton-pipelines-controller","caller":"pipelinerun/pipelinerun.go:745","msg":"Creating a new TaskRun object sample-pip
eline-run-hello-5wr7t for pipeline task hello","commit":"e7aa17c","knative.dev/controller":"github.com.tektoncd.pipeline.pkg.reconciler.pipelinerun.Reconciler","knative.dev/kin
d":"tekton.dev.PipelineRun","knative.dev/traceid":"44429614-7721-4a85-929b-94ff65eccccb","knative.dev/key":"default/sample-pipeline-run"}
{"level":"info","ts":"2021-04-29T12:54:23.021Z","logger":"tekton-pipelines-controller","caller":"taskrun/taskrun.go:373","msg":"Cloud Events: []","commit":"e7aa17c","knative.de
v/controller":"github.com.tektoncd.pipeline.pkg.reconciler.taskrun.Reconciler","knative.dev/kind":"tekton.dev.TaskRun","knative.dev/traceid":"79fce648-c13f-4b9f-8aa1-9ababb8ed1
4b","knative.dev/key":"default/sample-pipeline-run-test-7m5g8"}
{"level":"info","ts":"2021-04-29T12:54:23.026Z","logger":"tekton-pipelines-controller","caller":"pipelinerun/pipelinerun.go:582","msg":"PipelineRun sample-pipeline-run status i
s being set to &{Succeeded Unknown  {2021-04-29 12:54:23.026025579 +0000 UTC m=+952.876742716} Running Tasks Completed: 2 (Failed: 0, Cancelled 0), Incomplete: 2, Skipped: 0}",
"commit":"e7aa17c","knative.dev/controller":"github.com.tektoncd.pipeline.pkg.reconciler.pipelinerun.Reconciler","knative.dev/kind":"tekton.dev.PipelineRun","knative.dev/tracei
d":"44429614-7721-4a85-929b-94ff65eccccb","knative.dev/key":"default/sample-pipeline-run"}
{"level":"info","ts":"2021-04-29T12:54:23.028Z","logger":"tekton-pipelines-controller.event-broadcaster","caller":"record/event.go:282","msg":"Event(v1.ObjectReference{Kind:\"P
ipelineRun\", Namespace:\"default\", Name:\"sample-pipeline-run\", UID:\"e88173b1-b360-42c3-9009-5172b0edfdc2\", APIVersion:\"tekton.dev/v1beta1\", ResourceVersion:\"93506\", F
ieldPath:\"\"}): type: 'Normal' reason: 'Running' Tasks Completed: 2 (Failed: 0, Cancelled 0), Incomplete: 2, Skipped: 0","commit":"e7aa17c"}
{"level":"info","ts":"2021-04-29T12:54:23.029Z","logger":"tekton-pipelines-controller.event-broadcaster","caller":"record/event.go:282","msg":"Event(v1.ObjectReference{Kind:\"T
askRun\", Namespace:\"default\", Name:\"sample-pipeline-run-hello-5wr7t\", UID:\"1a4b583d-7d31-4425-bd8d-ad7496cbd49c\", APIVersion:\"tekton.dev/v1beta1\", ResourceVersion:\"93
765\", FieldPath:\"\"}): type: 'Normal' reason: 'Started' ","commit":"e7aa17c"}
{"level":"info","ts":"2021-04-29T12:54:23.033Z","logger":"tekton-pipelines-controller","caller":"taskrun/taskrun.go:373","msg":"Cloud Events: []","commit":"e7aa17c","knative.de
v/controller":"github.com.tektoncd.pipeline.pkg.reconciler.taskrun.Reconciler","knative.dev/kind":"tekton.dev.TaskRun","knative.dev/traceid":"e4e60199-3102-4cab-9b0b-92d93e060d
c3","knative.dev/key":"default/sample-pipeline-run-hello-5wr7t"}

Will update with another log snippet once the TaskRun times out or the pod gets created.

Still waiting after 7 minutes or so. There's not even a status field on the created TaskRun, which is really really odd.

After 25 minutes the hello TaskRun created its pod and failed because the args aren't constructed correctly. I went back and edited the Task so that it is now:

tasks:
  command:
  - echo
  args:
  - hi
ghost commented 3 years ago

OK, I was able to reproduce it once but subsequent PipelineRuns have not manifested the same problem. I'm continuing to debug.

Edit: After reverting any changes, deleting everything except the cluster itself, and re-applying the resources as specified here, I'm still unable to repro the issue a second time.

What strikes me as so odd here is that the TaskRun reconciler didn't even get to the stage where it initializes the status on the TaskRun. That seems to me like it isn't even being handed to our reconciler until those 20-odd minutes have passed.

ghost commented 3 years ago

@06kellyjac I might have missed this but what platform are you running on? E.g. openshift / gke / kind / docker kubernetes / etc...

Edit: for future reference I was running on kind 0.10.0

06kellyjac commented 3 years ago

Steps to Reproduce the Problem

  1. Create KIND cluster

I'm also on kind 0.10.0

host is nixos following the unstable channel

Edit: thanks for looking into this. Usually I know where to look for debugging and can handle it or at least give some initial debugging but without a pod to look at I'm stuck :sweat_smile:

ghost commented 3 years ago

I've been digging through the output from kind export logs but I haven't yet stumbled across anything that looks suspicious. I have to switch gears for a bit but will try to come back to this and keep looking. I want to check this against GKE and see if the same problem manifests.

TODO

XanderXAJ commented 3 years ago

I'm seeing the exact same issue with the Kaniko 0.3 task -- the TaskRun will get created but the pod won't be scheduled for a very long time.

I'm running Tekton Pipeline 0.23.0 on Kubernetes 1.20.2 via Kind 0.10.0 on Docker Desktop for Mac 3.3.1 (engine 20.10.5).

$ kind --version
kind version 0.10.0

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.7", GitCommit:"1dd5338295409edcfff11505e7bb246f0d325d15", GitTreeState:"clean", BuildDate:"2021-01-13T13:23:52Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2", GitCommit:"faecb196815e248d3ecfb03c680a4507229c2a56", GitTreeState:"clean", BuildDate:"2021-01-21T01:11:42Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}

$ tkn version
Client version: 0.18.0
Pipeline version: v0.23.0
Dashboard version: v0.16.1

If you need any more details, let me know and I can try and get them for you.

ghost commented 3 years ago

@XanderXAJ does the TaskRun that's created have any status field or is it totally devoid of any additional data beyond the yaml you submitted?

XanderXAJ commented 3 years ago

I'm just getting to grips with Tekton but, as best as I can tell, it is completely omitted:

$ tkn taskrun describe build-simple-docker-image-pvw9m-build-and-push-image-5rtzz -o yaml
apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"tekton.dev/v1beta1","kind":"Pipeline","metadata":{"annotations":{},"name":"build-docker-image","namespace":"default"},"spec":{"description":"build-docker-image will build and push a Docker image built from the specified repo and revision.\n","params":[{"description":"URL of git repository to clone","name":"repo-url","type":"string"},{"default":"","description":"Revision of git repository to clone","name":"repo-revision","type":"string"},{"description":"Name of the image to build and push","name":"image","type":"string"}],"tasks":[{"name":"clone-repo","params":[{"name":"url","value":"$(params.repo-url)"},{"name":"revision","value":"$(params.repo-revision)"}],"taskRef":{"name":"git-clone"},"workspaces":[{"name":"output","workspace":"shared-data"}]},{"name":"build-and-push-image","params":[{"name":"IMAGE","value":"$(params.image)"},{"name":"DOCKERFILE","value":"Dockerfile.tekton.app"},{"name":"CONTEXT","value":"web/build/docker"},{"name":"EXTRA_ARGS","value":"--insecure"}],"runAfter":["clone-repo"],"taskRef":{"name":"kaniko"},"workspaces":[{"name":"source","workspace":"shared-data"}]}],"workspaces":[{"description":"Stores the clone git repo for use in reading the README.","name":"shared-data"}]}}
    pipeline.tekton.dev/affinity-assistant: affinity-assistant-a1ee8f2cfb
  creationTimestamp: "2021-05-07T13:02:09Z"
  generation: 1
  labels:
    app.kubernetes.io/managed-by: tekton-pipelines
    tekton.dev/pipeline: build-docker-image
    tekton.dev/pipelineRun: build-simple-docker-image-pvw9m
    tekton.dev/pipelineTask: build-and-push-image
  managedFields:
  - apiVersion: tekton.dev/v1beta1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:kubectl.kubernetes.io/last-applied-configuration: {}
          f:pipeline.tekton.dev/affinity-assistant: {}
        f:labels:
          .: {}
          f:tekton.dev/pipeline: {}
          f:tekton.dev/pipelineRun: {}
          f:tekton.dev/pipelineTask: {}
        f:ownerReferences:
          .: {}
          k:{"uid":"b2510bd4-b8b0-4ff6-ac01-1f4faa2ddfce"}:
            .: {}
            f:apiVersion: {}
            f:blockOwnerDeletion: {}
            f:controller: {}
            f:kind: {}
            f:name: {}
            f:uid: {}
      f:spec:
        .: {}
        f:params: {}
        f:resources: {}
        f:serviceAccountName: {}
        f:taskRef:
          .: {}
          f:kind: {}
          f:name: {}
        f:timeout: {}
        f:workspaces: {}
      f:status:
        .: {}
        f:podName: {}
    manager: controller
    operation: Update
    time: "2021-05-07T13:02:09Z"
  name: build-simple-docker-image-pvw9m-build-and-push-image-5rtzz
  namespace: default
  ownerReferences:
  - apiVersion: tekton.dev/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: PipelineRun
    name: build-simple-docker-image-pvw9m
    uid: b2510bd4-b8b0-4ff6-ac01-1f4faa2ddfce
  resourceVersion: "3581"
  uid: bfb28d40-a266-468e-8640-aec6d649e4c2
spec:
  params:
  - name: IMAGE
    value: host.docker.internal:5000/platforms/simple-docker-app
  - name: DOCKERFILE
    value: Dockerfile.tekton.app
  - name: CONTEXT
    value: web/build/docker
  - name: EXTRA_ARGS
    value: --insecure
  resources: {}
  serviceAccountName: default
  taskRef:
    kind: Task
    name: kaniko
  timeout: 1h0m0s
  workspaces:
  - name: source
    persistentVolumeClaim:
      claimName: pvc-1ad5430ec5

Give it enough time and eventually the task will go through and get a status field -- in this case, about 25 minutes:

$ tkn taskrun describe build-simple-docker-image-pvw9m-build-and-push-image-5rtzz -o yaml
apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"tekton.dev/v1beta1","kind":"Task","metadata":{"annotations":{"tekton.dev/pipelines.minVersion":"0.17.0","tekton.dev/tags":"image-build"},"labels":{"app.kubernetes.io/version":"0.3"},"name":"kaniko","namespace":"default"},"spec":{"description":"This Task builds source into a container image using Google's kaniko tool.\nKaniko doesn't depend on a Docker daemon and executes each command within a Dockerfile completely in userspace. This enables building container images in environments that can't easily or securely run a Docker daemon, such as a standard Kubernetes cluster.","params":[{"description":"Name (reference) of the image to build.","name":"IMAGE"},{"default":"./Dockerfile","description":"Path to the Dockerfile to build.","name":"DOCKERFILE"},{"default":"./","description":"The build context used by Kaniko.","name":"CONTEXT"},{"default":"","name":"EXTRA_ARGS"},{"default":"gcr.io/kaniko-project/executor:v1.5.1@sha256:c6166717f7fe0b7da44908c986137ecfeab21f31ec3992f6e128fff8a94be8a5","description":"The image on which builds will run (default is v1.5.1)","name":"BUILDER_IMAGE"}],"results":[{"description":"Digest of the image just built.","name":"IMAGE-DIGEST"}],"steps":[{"args":["$(params.EXTRA_ARGS)","--dockerfile=$(params.DOCKERFILE)","--context=$(workspaces.source.path)/$(params.CONTEXT)","--destination=$(params.IMAGE)","--oci-layout-path=$(workspaces.source.path)/$(params.CONTEXT)/image-digest"],"image":"$(params.BUILDER_IMAGE)","name":"build-and-push","securityContext":{"runAsUser":0},"workingDir":"$(workspaces.source.path)"},{"args":["-images=[{\"name\":\"$(params.IMAGE)\",\"type\":\"image\",\"url\":\"$(params.IMAGE)\",\"digest\":\"\",\"OutputImageDir\":\"$(workspaces.source.path)/$(params.CONTEXT)/image-digest\"}]","-terminationMessagePath=$(params.CONTEXT)/image-digested"],"command":["/ko-app/imagedigestexporter"],"image":"gcr.io/tekton-releases/github.com/tektoncd/pipeline/cmd/imagedigestexporter:v0.16.2","name":"write-digest","securityContext":{"runAsUser":0},"workingDir":"$(workspaces.source.path)"},{"image":"docker.io/stedolan/jq@sha256:a61ed0bca213081b64be94c5e1b402ea58bc549f457c2682a86704dd55231e09","name":"digest-to-results","script":"cat $(params.CONTEXT)/image-digested | jq '.[0].value' -rj | tee /tekton/results/IMAGE-DIGEST\n","workingDir":"$(workspaces.source.path)"}],"workspaces":[{"description":"Holds the context and docker file","name":"source"},{"description":"Includes a docker `config.json`","mountPath":"/kaniko/.docker","name":"dockerconfig","optional":true}]}}
    pipeline.tekton.dev/affinity-assistant: affinity-assistant-a1ee8f2cfb
    pipeline.tekton.dev/release: v0.23.0
    tekton.dev/pipelines.minVersion: 0.17.0
    tekton.dev/tags: image-build
  creationTimestamp: "2021-05-07T13:02:09Z"
  generation: 1
  labels:
    app.kubernetes.io/managed-by: tekton-pipelines
    app.kubernetes.io/version: "0.3"
    tekton.dev/pipeline: build-docker-image
    tekton.dev/pipelineRun: build-simple-docker-image-pvw9m
    tekton.dev/pipelineTask: build-and-push-image
    tekton.dev/task: kaniko
  managedFields:
  - apiVersion: tekton.dev/v1beta1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:kubectl.kubernetes.io/last-applied-configuration: {}
          f:pipeline.tekton.dev/affinity-assistant: {}
          f:pipeline.tekton.dev/release: {}
          f:tekton.dev/pipelines.minVersion: {}
          f:tekton.dev/tags: {}
        f:labels:
          .: {}
          f:app.kubernetes.io/version: {}
          f:tekton.dev/pipeline: {}
          f:tekton.dev/pipelineRun: {}
          f:tekton.dev/pipelineTask: {}
          f:tekton.dev/task: {}
        f:ownerReferences:
          .: {}
          k:{"uid":"b2510bd4-b8b0-4ff6-ac01-1f4faa2ddfce"}:
            .: {}
            f:apiVersion: {}
            f:blockOwnerDeletion: {}
            f:controller: {}
            f:kind: {}
            f:name: {}
            f:uid: {}
      f:spec:
        .: {}
        f:params: {}
        f:resources: {}
        f:serviceAccountName: {}
        f:taskRef:
          .: {}
          f:kind: {}
          f:name: {}
        f:timeout: {}
        f:workspaces: {}
      f:status:
        .: {}
        f:completionTime: {}
        f:conditions: {}
        f:podName: {}
        f:startTime: {}
        f:steps: {}
        f:taskSpec:
          .: {}
          f:description: {}
          f:params: {}
          f:results: {}
          f:steps: {}
          f:workspaces: {}
    manager: controller
    operation: Update
    time: "2021-05-07T13:27:57Z"
  name: build-simple-docker-image-pvw9m-build-and-push-image-5rtzz
  namespace: default
  ownerReferences:
  - apiVersion: tekton.dev/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: PipelineRun
    name: build-simple-docker-image-pvw9m
    uid: b2510bd4-b8b0-4ff6-ac01-1f4faa2ddfce
  resourceVersion: "11648"
  uid: bfb28d40-a266-468e-8640-aec6d649e4c2
spec:
  params:
  - name: IMAGE
    value: host.docker.internal:5000/platforms/simple-docker-app
  - name: DOCKERFILE
    value: Dockerfile.tekton.app
  - name: CONTEXT
    value: web/build/docker
  - name: EXTRA_ARGS
    value: --insecure
  resources: {}
  serviceAccountName: default
  taskRef:
    kind: Task
    name: kaniko
  timeout: 1h0m0s
  workspaces:
  - name: source
    persistentVolumeClaim:
      claimName: pvc-1ad5430ec5
status:
  completionTime: "2021-05-07T13:27:57Z"
  conditions:
  - lastTransitionTime: "2021-05-07T13:27:57Z"
    message: |
      "step-build-and-push" exited with code 1 (image: "gcr.io/kaniko-project/executor@sha256:68bb272f681f691254acfbdcef00962f22efe2f0c1e287e6a837b0abe07fb94b"); for logs run: kubectl -n default logs build-simple-docker-image-pvw9m-build-and-push-image-5rtz-lrhpc -c step-build-and-push
    reason: Failed
    status: "False"
    type: Succeeded
  podName: build-simple-docker-image-pvw9m-build-and-push-image-5rtz-lrhpc
  startTime: "2021-05-07T13:02:09Z"
  steps:
  - container: step-build-and-push
    imageID: gcr.io/kaniko-project/executor@sha256:68bb272f681f691254acfbdcef00962f22efe2f0c1e287e6a837b0abe07fb94b
    name: build-and-push
    terminated:
      containerID: containerd://8c5011a5b5461b1206a84cd9f6f9c3d2601e4eb49ef32a56bb95833ff4e6d80f
      exitCode: 1
      finishedAt: "2021-05-07T13:27:55Z"
      reason: Error
      startedAt: "2021-05-07T13:27:55Z"
  - container: step-write-digest
    imageID: gcr.io/tekton-releases/github.com/tektoncd/pipeline/cmd/imagedigestexporter@sha256:c756458c29ef62cc1d660e6118057f2dae33926eefb6950b6a6a5455cdf35250
    name: write-digest
    terminated:
      containerID: containerd://96c0edaac05cc280dd0b92e2f81864644e884518fddb98e1e3bdb1a16220346b
      exitCode: 1
      finishedAt: "2021-05-07T13:27:56Z"
      reason: Error
      startedAt: "2021-05-07T13:27:56Z"
  - container: step-digest-to-results
    imageID: docker.io/stedolan/jq@sha256:a61ed0bca213081b64be94c5e1b402ea58bc549f457c2682a86704dd55231e09
    name: digest-to-results
    terminated:
      containerID: containerd://e3e42bcf7c724d650386f28443b7cebf82ac1a0efdc72fde33b611c1c43ee9a3
      exitCode: 1
      finishedAt: "2021-05-07T13:27:56Z"
      reason: Error
      startedAt: "2021-05-07T13:27:56Z"
  taskSpec:
    description: |-
      This Task builds source into a container image using Google's kaniko tool.
      Kaniko doesn't depend on a Docker daemon and executes each command within a Dockerfile completely in userspace. This enables building container images in environments that can't easily or securely run a Docker daemon, such as a standard Kubernetes cluster.
    params:
    - description: Name (reference) of the image to build.
      name: IMAGE
      type: string
    - default: ./Dockerfile
      description: Path to the Dockerfile to build.
      name: DOCKERFILE
      type: string
    - default: ./
      description: The build context used by Kaniko.
      name: CONTEXT
      type: string
    - default: ""
      name: EXTRA_ARGS
      type: string
    - default: gcr.io/kaniko-project/executor:v1.5.1@sha256:c6166717f7fe0b7da44908c986137ecfeab21f31ec3992f6e128fff8a94be8a5
      description: The image on which builds will run (default is v1.5.1)
      name: BUILDER_IMAGE
      type: string
    results:
    - description: Digest of the image just built.
      name: IMAGE-DIGEST
    steps:
    - args:
      - $(params.EXTRA_ARGS)
      - --dockerfile=$(params.DOCKERFILE)
      - --context=$(workspaces.source.path)/$(params.CONTEXT)
      - --destination=$(params.IMAGE)
      - --oci-layout-path=$(workspaces.source.path)/$(params.CONTEXT)/image-digest
      image: $(params.BUILDER_IMAGE)
      name: build-and-push
      resources: {}
      securityContext:
        runAsUser: 0
      workingDir: $(workspaces.source.path)
    - args:
      - -images=[{"name":"$(params.IMAGE)","type":"image","url":"$(params.IMAGE)","digest":"","OutputImageDir":"$(workspaces.source.path)/$(params.CONTEXT)/image-digest"}]
      - -terminationMessagePath=$(params.CONTEXT)/image-digested
      command:
      - /ko-app/imagedigestexporter
      image: gcr.io/tekton-releases/github.com/tektoncd/pipeline/cmd/imagedigestexporter:v0.16.2
      name: write-digest
      resources: {}
      securityContext:
        runAsUser: 0
      workingDir: $(workspaces.source.path)
    - image: docker.io/stedolan/jq@sha256:a61ed0bca213081b64be94c5e1b402ea58bc549f457c2682a86704dd55231e09
      name: digest-to-results
      resources: {}
      script: |
        cat $(params.CONTEXT)/image-digested | jq '.[0].value' -rj | tee /tekton/results/IMAGE-DIGEST
      workingDir: $(workspaces.source.path)
    workspaces:
    - description: Holds the context and docker file
      name: source
    - description: Includes a docker `config.json`
      mountPath: /kaniko/.docker
      name: dockerconfig
      optional: true
ghost commented 3 years ago

Yeah, very bizarre. Thanks for the additional data here, I will try to find some time next week to see if this is a kind-specific issue.

tekton-robot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale with a justification. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot commented 2 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten with a justification. Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

tekton-robot commented 2 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen with a justification. Mark the issue as fresh with /remove-lifecycle rotten with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

tekton-robot commented 2 years ago

@tekton-robot: Closing this issue.

In response to [this](https://github.com/tektoncd/pipeline/issues/3898#issuecomment-993106999): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen` with a justification. >Mark the issue as fresh with `/remove-lifecycle rotten` with a justification. >If this issue should be exempted, mark the issue as frozen with `/lifecycle frozen` with a justification. > >/close > >Send feedback to [tektoncd/plumbing](https://github.com/tektoncd/plumbing). Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.