Closed Fabian-K closed 2 years ago
This might be related to https://github.com/tektoncd/pipeline/issues/2813 :thinking:
Is my expectation right that only one TaskRun should be created?
yep!
This might be related to #2813 🤔
hmm in that one, a container within a pod was restarted, im not sure that would be caused by the same thing, since this sounds like the controller is deciding to create more than one TaskRun
@Fabian-K do you happen to have the PipelineRun status information available? it would be interesting to see if it included more information, e.g. did it include just the one TaskRun or both
Also does this happen consistently or just sometimes?
Thanks for the quick reply. This is not the first time I noticed multiple taskrun creations but it´s difficult to say how often this happens - also because this particular pipeline can be executed multiple times without an issue so it might also happen unnoticed.
Background Info: We also run https://github.com/fluxcd/flux in the cluster. Flux is creating the Pipeline and the Task objects - but should not do anything related to PipelineRuns. Those are created via the Kubernetes API.
I´m currently thinking if some of the flux-related annotations/labels were added by the flux controller (?) after the initial creation and that this modification causes an unexpected reconciliation which with bad timing results in multiple taskruns being created? 🤔 Although this is just a guess, not sure if the annotations are added after creation or if they are propagated by tekton from the pipeline?!
Here is the PipelineRun:
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
annotations:
fluxcd.io/sync-checksum: 61a317e9bf1eba3100f64a57c618d5a346e7c82b
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"tekton.dev/v1beta1","kind":"Pipeline","metadata":{"annotations":{"fluxcd.io/sync-checksum":"61a317e9bf1eba3100f64a57c618d5a346e7c82b"},"labels":{"fluxcd.io/sync-gc-mark":"sha256.D5QWv8-8W1lY3-H4_5LxwilzVXygk0aSWl7S0KuLAcI"},"name":"uiveri5-pipeline-v3","namespace":"site"},"spec":{"params":[{"description":"Secret containing the user for text execution","name":"user-secret","type":"string"}],"resources":[{"name":"source-repo","type":"git"}],"tasks":[{"name":"uiveri5","params":[{"name":"user-secret","value":"$(params.user-secret)"}],"resources":{"inputs":[{"name":"workspace","resource":"source-repo"}]},"taskRef":{"name":"uiveri5-v3"}}]}}
creationTimestamp: "2020-08-24T08:09:11Z"
generation: 1
labels:
fluxcd.io/sync-gc-mark: sha256.D5QWv8-8W1lY3-H4_5LxwilzVXygk0aSWl7S0KuLAcI
kind: uiveri5
project: "5005"
technical-name: lunchmenu
tekton.dev/pipeline: uiveri5-pipeline-v3
name: uiveri5-lunchmenu-1598256551145
namespace: site
resourceVersion: "28479873"
selfLink: /apis/tekton.dev/v1beta1/namespaces/site/pipelineruns/uiveri5-lunchmenu-1598256551145
uid: 9d70ede0-b3c3-4bf0-9928-505cd905d655
spec:
params:
- name: user-secret
value: uiveri5-e2e-test-user
pipelineRef:
name: uiveri5-pipeline-v3
resources:
- name: source-repo
resourceSpec:
params:
- name: url
value: git@github.wdf.sap.corp:catering/lunchmenu-e2e-tests.git
- name: revision
value: master
type: git
serviceAccountName: build-bot
timeout: 1h0m0s
status:
completionTime: "2020-08-24T08:09:57Z"
conditions:
- lastTransitionTime: "2020-08-24T08:09:57Z"
message: 'Tasks Completed: 1 (Failed: 1, Cancelled 0), Skipped: 0'
reason: Failed
status: "False"
type: Succeeded
pipelineSpec:
params:
- description: Secret containing the user for text execution
name: user-secret
type: string
resources:
- name: source-repo
type: git
tasks:
- name: uiveri5
params:
- name: user-secret
value: $(params.user-secret)
resources:
inputs:
- name: workspace
resource: source-repo
taskRef:
kind: Task
name: uiveri5-v3
startTime: "2020-08-24T08:09:11Z"
taskRuns:
uiveri5-lunchmenu-1598256551145-uiveri5-vllbc:
pipelineTaskName: uiveri5
status:
completionTime: "2020-08-24T08:09:57Z"
conditions:
- lastTransitionTime: "2020-08-24T08:09:57Z"
message: |
"step-uiveri5" exited with code 1 (image: "docker-pullable://docker.wdf.sap.corp:51175/platformcoe/beekeeper/uiveri5@sha256:d27c785eeafc0115265eff0da5668819c243d0fd8ee83c7762095749b1ca9a91"); for logs run: kubectl -n site logs uiveri5-lunchmenu-1598256551145-uiveri5-vllbc-pod-hxxx2 -c step-uiveri5
reason: Failed
status: "False"
type: Succeeded
podName: uiveri5-lunchmenu-1598256551145-uiveri5-vllbc-pod-hxxx2
startTime: "2020-08-24T08:09:11Z"
steps:
- container: step-uiveri5
imageID: docker-pullable://docker.wdf.sap.corp:51175/platformcoe/beekeeper/uiveri5@sha256:d27c785eeafc0115265eff0da5668819c243d0fd8ee83c7762095749b1ca9a91
name: uiveri5
terminated:
containerID: docker://913276c49c6bbfcdd3d7225abcccd313c876dbfd527bce7398a404c2bbc5a8cf
exitCode: 1
finishedAt: "2020-08-24T08:09:56Z"
reason: Error
startedAt: "2020-08-24T08:09:18Z"
- container: step-git-source-workspace-spgfc
imageID: docker-pullable://gcr.io/tekton-releases/github.com/tektoncd/pipeline/cmd/git-init@sha256:c6cb2257d718bbd6281f0d028c801c26333221a2c38ce85ae8e23b24bf20e781
name: git-source-workspace-spgfc
terminated:
containerID: docker://1554263ff61d03f6bd3ec32c2ca7154c5c5bac55ec8e8f7ddaed5252ceef9715
exitCode: 0
finishedAt: "2020-08-24T08:09:18Z"
message: '[{"key":"commit","value":"4eed19e75aef56fcff2a89ffc4995d51263b509d","resourceName":"workspace","resourceRef":{"name":"workspace"}}]'
reason: Completed
startedAt: "2020-08-24T08:09:18Z"
taskSpec:
params:
- name: user-secret
type: string
resources:
inputs:
- name: workspace
type: git
stepTemplate:
env:
- name: DT_CLUSTER_ID
value: uiveri5-v3
name: ""
resources: {}
steps:
- args:
- --seleniumAddress=https://$(SAUCELABS_USER):$(SAUCELABS_PASSWORD)@ondemand.eu-central-1.saucelabs.com/wd/hub
- --params.user=$(TEST_USER)
- --params.pass=$(TEST_PASSWORD)
command:
- uiveri5
env:
- name: TEST_USER
valueFrom:
secretKeyRef:
key: username
name: $(inputs.params.user-secret)
- name: TEST_PASSWORD
valueFrom:
secretKeyRef:
key: password
name: $(inputs.params.user-secret)
- name: SAUCELABS_USER
valueFrom:
secretKeyRef:
key: user
name: bot-build-platform-coe-at-saucelabs
- name: SAUCELABS_PASSWORD
valueFrom:
secretKeyRef:
key: password
name: bot-build-platform-coe-at-saucelabs
image: docker.wdf.sap.corp:51175/platformcoe/beekeeper/uiveri5:e168823de1b6ef6750e195b8ee0c7456061fec0f
name: uiveri5
resources:
requests:
cpu: 1m
memory: 50Mi
workingDir: /workspace/workspace
interesting, it looks like uiveri5-lunchmenu-1598256551145-uiveri5-d7fkh ended up in the status and not uiveri5-lunchmenu-1598256551145-uiveri5-d7fkh
i wonder if something could happen that interrupts the reconciler from updating the status and so it thinks it needs to create another taskrun?? very curious!!
The log shows that the pipelinerun reconciler issued a "started" event for the pipelinerun again as well before creating the duplicate TaskRun. This confirms that it is reading "old" PR status. The log does not show any errors updating the status (and genreconciler code would log that).
The pipelinerun reconciler has a function called updatePipelineRunStatusFromInformer
which is intended to close holes like this
where TaskRuns aren't in the status due to a lost status update. It lists the TaskRuns that are associated with the PipelineRun (matching via labels) and adds any that are missing back to the status. It is using the lister to find the TaskRuns and I suppose there's an exposure here that the TaskRun isn't in the lister yet. It seems unlikely in this case because the log shows that the first TaskRun was reconciled and started before the pipelinerun was reconciled again (presumably for that TaskRun status update), so the TaskRun ought to be in the lister (?). It would be costly to go to the API server to find the TaskRuns.
As far as why only one TaskRun ends up in the PR's status, this is because the pipelineState can hold only one TaskRun per pipeline task so one of them wins. The pipelineState is used to completely rewrite the PR's TaskRun statuses at the end of the reconcile.
cc @afrittoli
I've been trying to reproduce this but am struggling to do so. I've rolled my cluster back to Pipelines 0.15.2 and then written up some resources:
apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
name: test-task
spec:
params:
- name: user-secret
type: string
resources:
inputs:
- name: workspace
type: git
steps:
- name: uiveri5
image: alpine:3.12.0
script: |
echo "$(params.user-secret)"
cd /workspace/workspace
ls
sleep 20
exit 1
apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
name: test-pipeline
spec:
params:
- description: Secret containing the user for test execution
name: user-secret
type: string
resources:
- name: source-repo
type: git
tasks:
- name: uiveri5
taskRef:
name: test-task
params:
- name: user-secret
value: $(params.user-secret)
resources:
inputs:
- name: workspace
resource: source-repo
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
generateName: test-pr
spec:
pipelineRef:
name: test-pipeline
params:
- name: user-secret
value: foo
resources:
- name: source-repo
resourceSpec:
type: git
params:
- name: url
value: https://github.com/tektoncd/pipeline.git
for i in {1..100} ; do kubectl create -f ./test.yaml ; done
kubectl get pipelineruns | wc -l
kubectl get taskruns | wc -l
I've repeated this test maybe 20 times and haven't been able to repro. I'm running on GKE with a 1.16 server so I'm going through the steps now to upgrade to a 1.18 server to match the original post and see if that makes a difference.
@Fabian-K which platform are you running on here? Is it a bare metal cluster or something else?
I've updated to k8s 1.18 and have written a script to try and replicate the annotation/label behaviour of fluxcd described in https://github.com/tektoncd/pipeline/issues/3126#issuecomment-680071108
#!/usr/bin/env bash
set -x
for i in {1..40} ; do
kubectl create -f ./test.yaml
done
for i in {1..5} ; do
runs=$(kubectl get taskruns --no-headers | awk '{ print $1 }')
for run in $runs ; do
value="$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 8 | head -n 1)"
kubectl annotate --overwrite taskrun $run foo="$value"
value="$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 8 | head -n 1)"
kubectl label --overwrite taskrun $run bar="$value"
done
sleep 2
done
I've run this maybe 10 times and haven't been able to trigger a mismatch in taskrun count vs pipelinerun count. So I'm clearly missing something here in the reproduction of the bad behaviour. 😕
thanks a bunch @sbwsg 🙏
I think the best would be hear back from @Fabian-K with the platform details.
Also, if @Fabian-K can provide exact pipeline with task and resources used, might help troubleshoot this further. The reason being the pipeline in the description has some metadata
which I am sure not sent while creating pipelineRun
.
@Fabian-K it would be great if you can reproduce this on 0.18 since 0.18 has timestamps in the controller logs which would be again helpful.
I am seeing this behaviour as well as commented in https://github.com/tektoncd/pipeline/issues/3510. I'm running OpenShift Pipelines Technology Preview 1.1, which means:
Thanks for your effort in reproducing the issue! The cluster is a managed cluster by https://gardener.cloud/ with OpenStack as the infrastructure provider. I also tried to reproduce it locally using kind but so far did not manage to do so :(
We are updating our two instances to 0.18 soon. Will definitely look for new occurrences and try to provide controller logs.
Not sure if this is helpful but it seems to me that this tends to happen in namespaces with a lot of PipelineRuns / TaskRuns (in our case around 2000 instances).
thanks @Fabian-K
@psschwei did you notice any such odd behavior (multiple taskruns getting created for a pipeline with a single task) while running the stress test?
@psschwei did you notice any such odd behavior (multiple taskruns getting created for a pipeline with a single task) while running the stress test?
Not that I recall, although it wasn't something I was explicitly watching for either.
In my case it is not related to stress at all. Happens from the very first PipelineRun in a namespace.
@michaelsauter Can you provide your pipeline controller log for the period where the duplicate taskruns are created?
I've tried to run some tests and I can't recreate a duplicated TaskRun. This included several thousand runs on OpenShift 4.5 (== Kubernetes 0.18 and Tekton 0.14.3).
Out of curiosity I ran an instrumented PipelineRun controller to see how quickly the informer distributes added/updated *Run objects. I was surprised to see it happens very quickly. The informer consistently notified it of a new TaskRun created by the PipelineRun reconciler before the reconciler exited the reconcile call that created it. In other words it was pseudo-synchronous. If there is a caching issue, this probably explains why I'm not seeing it. I'm observing practically no delay in my cache.
Since I still suspect it's a problem with the informer cache, I started looking around for more details and guidance about dealing with stale objects and found this:
The first piece of advice, "use deterministic names for objects you create", got me to wondering about how the PipelineRun reconciler generates random suffixes for the TaskRuns. It has to generate a suffix because the Pipeline task name might get truncated and no longer be unique. However I don't think the suffix needs to be random -- just unique within the PipelineRun. It could use the index of the pipeline task which would be unique and deterministic.
I believe this should work whether the PipelineRun's name is generated or not (i.e. uses generateName
or name
). In the former case the random suffix generated for the PipelineRun will distinguish its TaskRuns as well. In the latter case there can be only one PipelineRun with that name so the only requirement is to distinguish the TaskRuns within that one PipelineRun.
If we did this then as the article suggests the PipelineRun reconciler would get an error when it tried to create a duplicate TaskRun. Theoretically it would just skip creating that TaskRun and go on normally. It would trust that the TaskRun would eventually show up in the cache. (I guess we could debate how much trust it should have in that.)
Food for thought anyway. I am curious if anyone knows why the TaskRuns should have their own random suffix, not just a unique-within-the-pipeline suffix.
Wow, thanks for digging further into this @GregDritschler! Your theory regarding the cache makes sense to me and I think your proposed solution would afford us an opportunity to introduce better logging when the creation happens but fails. This would in turn validate whether the theory is on the money.
It could use the index of the pipeline task which would be unique and deterministic.
We'd need to account for retries of Tasks as well I think? Otherwise Task N might fail, retry, but be blocked from creating the new TaskRun due to name collision? So maybe \
Food for thought anyway. I am curious if anyone knows why the TaskRuns should have their own random suffix, not just a unique-within-the-pipeline suffix.
Other than the retries thing I'm not aware of another reason that we'd require a random suffix in PipelineRun TaskRuns.
Retries reuse the same TaskRun. They just recreate a new pod from the same TaskRun. https://github.com/tektoncd/pipeline/blob/release-v0.17.x/pkg/reconciler/pipelinerun/pipelinerun.go#L592
The new pod will still have its own random suffix. https://github.com/tektoncd/pipeline/blob/release-v0.17.x/pkg/pod/pod.go#L267
This does make me wonder if the TaskRun reconciler has a duplication exposure as well, i.e. it creates a pod, gets re-driven again with stale status that doesn't have the pod name, and creates a duplicate pod. I don't recall seeing any reports of that happening. If it did happen, I guess the TaskRun reconciler could be changed to generate the pod suffix deterministically. It can compute the retry number by looking at the retriesStatus.
That comment in pod.go indicates the choice to use a random suffix was "to avoid confusion when a build is deleted and re-created with the same name." I suspect that's what drove the use of a random suffix for the TaskRuns too. I'm not sure I buy the argument. When you delete a run, it deletes the owned objects as well. So the only objects that exist at a given time for a given run name are the ones from the latest run. I'm not sure if that's "confusing". Regardless I hope these days everyone uses generateName in their runs anyway.
I thought about this some more and found a case where the random suffix for the TaskRun comes into play. This happens when the name of the PipelineRun itself is the max length of 64 characters. I created this PipelineRun:
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
generateName: the-itsy-bitsy-spider-climbed-up-the-waterspout-down-came-the-rain-and-washed-the-spider-out
...
This resulted in a PipelineRun name of the-itsy-bitsy-spider-climbed-up-the-waterspout-down-came-x44zw
.
The PipelineRun created a TaskRun named the-itsy-bitsy-spider-climbed-up-the-waterspout-down-came-9rmdm
.
The PipelineRun's name was truncated (dropping off its unique suffix) and the random suffix for the TaskRun was appended.
So this puts a monkey wrench in my idea about using predictable TaskRun names. I suppose the PipelineRun name could be truncated in some way to preserve its unique suffix but that makes me worry that it introduces a new exposure that that name might not be unique anymore. (That is, the name created by generateName is unique across PipelineRuns, but a truncated version of that may not be.)
Finally, here are my Tekton controller logs: https://gist.github.com/michaelsauter/9af328156693e9f615cb1fa699fdd69f.
There are some errors like Failed to delete StatefulSet for PipelineRun build-and-deploy-run-8sxp8: failed to delete StatefulSet affinity-assistant-59d5d42441: statefulsets.apps \"affinity-assistant-59d5d42441\" not found
... could that be causing the behaviour?
The gist https://gist.github.com/Fabian-K/10f46e5b0cd23318985e5d8f3c7b5bc2 contains one occurrence using Tekton 0.18.0
I finally was able to recreate the problem on OpenShift with Tekton 0.18 (with some debugging statements added).
I ran a script to repeatedly create a PipelineRun and then update its labels. It took several hundred runs but I finally got a PR to create a duplicate TR. The log clearly shows the reconciler being driven with state status, the TR not being found in the lister cache, and the reconciler creating a duplicate TR.
I'm going to bring the problem up in the next API WG.
I´m currently thinking if some of the flux-related annotations/labels were added by the flux controller (?) after the initial creation and that this modification causes an unexpected reconciliation which with bad timing results in multiple taskruns being created?
@Fabian-K This is in fact the problem. The reconciler is using all of the PipelineRun's labels (which includes labels propagated from the Pipeline) to look for TaskRuns associated with the PipelineRun. If the labels change in any way, it won't find the TaskRuns that it created prior to that change and it will create duplicate TaskRuns. I have linked a pull request that changes the code to use only the tekton.dev/pipelineRun
label for the listing of existing TaskRuns.
@itewk @michaelsauter Please check if anything is modifying Pipeline or PipelineRun labels while the PipelineRun is running. If not, then we will have to keep digging into your scenarios.
@GregDritschler At least for me, I just followed https://github.com/openshift/pipelines-tutorial. Nothing special, and there is nothing else on that cluster that would cause modifications to labels. I have another cluster available now though, with the same versions, and I can try there again ...
Just as a FYI: I'm now on a OpenShift 4.6 cluster which provides OpenShift Pipelines 1.2 instead of 1.1. On 1.2, I cannot reproduce the problem above.
i am on OCP 4.6 with Pipelines Operator 1.2.2 and deffinitly still seeing this. I thought this fix was gonna be in 1.3 of the operator based on how versions are matched up.
I've tried to run some tests and I can't recreate a duplicated TaskRun. This included several thousand runs on OpenShift 4.5 (== Kubernetes 0.18 and Tekton 0.14.3).
Out of curiosity I ran an instrumented PipelineRun controller to see how quickly the informer distributes added/updated *Run objects. I was surprised to see it happens very quickly. The informer consistently notified it of a new TaskRun created by the PipelineRun reconciler before the reconciler exited the reconcile call that created it. In other words it was pseudo-synchronous. If there is a caching issue, this probably explains why I'm not seeing it. I'm observing practically no delay in my cache.
Since I still suspect it's a problem with the informer cache, I started looking around for more details and guidance about dealing with stale objects and found this:
The first piece of advice, "use deterministic names for objects you create", got me to wondering about how the PipelineRun reconciler generates random suffixes for the TaskRuns. It has to generate a suffix because the Pipeline task name might get truncated and no longer be unique. However I don't think the suffix needs to be random -- just unique within the PipelineRun. It could use the index of the pipeline task which would be unique and deterministic.
I believe this should work whether the PipelineRun's name is generated or not (i.e. uses
generateName
orname
). In the former case the random suffix generated for the PipelineRun will distinguish its TaskRuns as well. In the latter case there can be only one PipelineRun with that name so the only requirement is to distinguish the TaskRuns within that one PipelineRun.If we did this then as the article suggests the PipelineRun reconciler would get an error when it tried to create a duplicate TaskRun. Theoretically it would just skip creating that TaskRun and go on normally. It would trust that the TaskRun would eventually show up in the cache. (I guess we could debate how much trust it should have in that.)
Food for thought anyway. I am curious if anyone knows why the TaskRuns should have their own random suffix, not just a unique-within-the-pipeline suffix.
I Have reproduce this bug on Pipeline version: v0.25.0 I fix it with get taskrun from apiserver directly
Similar issue is reported in #3557
@Rand01ph I have added this item in the API WG to discuss, is there anyway you can provide more details which we can discuss in the WG? It will be great if you can attend the discussion. If not, I will appreciate any more details such as, screen shots, screen recording or anything else 🙏
@Rand01ph I have added this item in the API WG to discuss, is there anyway you can provide more details which we can discuss in the WG? It will be great if you can attend the discussion. If not, I will appreciate any more details such as, screen shots, screen recording or anything else pray
Yes, I'm very happly to join your discuss. I only have some log now, for the recreate taskruns.
{"level":"info","ts":"2021-10-18T09:03:51.721Z","logger":"tekton-pipelines-controller.event-broadcaster","caller":"record/event.go:282","msg":"Event(v1.ObjectReference{Kind:\"PipelineRun\", Namespace:\"tyw-xxxx-api\", Name:\"1-3-57-b4fa7e32-70fd-475f-b7a4-00c45914ad80-41\", UID:\"3c30044d-1b1b-41dc-b9dd-632c8bcb5269\", APIVersion:\"tekton.dev/v1beta1\", ResourceVersion:\"114893338\", FieldPath:\"\"}): type: 'Normal' reason: 'Started' ","commit":"584b527"}
{"level":"info","ts":"2021-10-18T09:03:51.729Z","logger":"tekton-pipelines-controller","caller":"pipelinerun/pipelinerun.go:353","msg":"No.1 pipelineMeta is: &ObjectMeta{Name:3b8d0ae68a6fc5c95a91570e9528807c9eb2f164,GenerateName:,Namespace:tyw-xxxx-api,SelfLink:/apis/tekton.dev/v1beta1/namespaces/tyw-xxxx-api/pipelines/3b8d0ae68a6fc5c95a91570e9528807c9eb2f164,UID:29481e0e-19e9-43fb-b036-f2e87a71bc2f,ResourceVersion:114893337,Generation:1,CreationTimestamp:2021-10-18 09:03:51 +0000 UTC,DeletionTimestamp:
{"level":"info","ts":"2021-10-18T09:03:51.738Z","logger":"tekton-pipelines-controller","caller":"pipelinerun/pipelinerun.go:493","msg":"No.2 resolvePipelineState is: &[0xc0009b3ce0]","commit":"584b527","knative.dev/controller":"github.com.tektoncd.pipeline.pkg.reconciler.pipelinerun.Reconciler","knative.dev/kind":"tekton.dev.PipelineRun","knative.dev/traceid":"57fe491d-3862-41bf-9b53-116cff25ab80","knative.dev/key":"tyw-xxxx-api/1-3-57-b4fa7e32-70fd-475f-b7a4-00c45914ad80-41"}
{"level":"info","ts":"2021-10-18T09:03:51.757Z","logger":"tekton-pipelines-controller","caller":"volumeclaim/pvchandler.go:67","msg":"Created PersistentVolumeClaim pvc-9b4da94f2e in namespace tyw-xxxx-api","commit":"584b527"}
{"level":"info","ts":"2021-10-18T09:03:51.777Z","logger":"tekton-pipelines-controller","caller":"pipelinerun/affinity_assistant.go:66","msg":"Created StatefulSet affinity-assistant-819ff3551e in namespace tyw-xxxx-api","commit":"584b527","knative.dev/controller":"github.com.tektoncd.pipeline.pkg.reconciler.pipelinerun.Reconciler","knative.dev/kind":"tekton.dev.PipelineRun","knative.dev/traceid":"57fe491d-3862-41bf-9b53-116cff25ab80","knative.dev/key":"tyw-xxxx-api/1-3-57-b4fa7e32-70fd-475f-b7a4-00c45914ad80-41"}
{"level":"info","ts":"2021-10-18T09:03:51.777Z","logger":"tekton-pipelines-controller","caller":"pipelinerun/pipelinerun.go:771","msg":"Creating a new TaskRun object 1-3-57-b4fa7e32-70fd-475f-b7a4-00c45914ad80-41-3d8e52746c-7w4sq for pipeline task 3d8e52746c39bdaf383fc1026d2ebd56816b9ca8","commit":"584b527","knative.dev/controller":"github.com.tektoncd.pipeline.pkg.reconciler.pipelinerun.Reconciler","knative.dev/kind":"tekton.dev.PipelineRun","knative.dev/traceid":"57fe491d-3862-41bf-9b53-116cff25ab80","knative.dev/key":"tyw-xxxx-api/1-3-57-b4fa7e32-70fd-475f-b7a4-00c45914ad80-41"}
{"level":"info","ts":"2021-10-18T09:03:51.850Z","logger":"tekton-pipelines-controller","caller":"pipelinerun/pipelinerun.go:601","msg":"PipelineRun 1-3-57-b4fa7e32-70fd-475f-b7a4-00c45914ad80-41 status is being set to &{Succeeded Unknown {2021-10-18 09:03:51.85019029 +0000 UTC m=+440512.276097640} Running Tasks Completed: 0 (Failed: 0, Cancelled 0), Incomplete: 1, Skipped: 0}","commit":"584b527","knative.dev/controller":"github.com.tektoncd.pipeline.pkg.reconciler.pipelinerun.Reconciler","knative.dev/kind":"tekton.dev.PipelineRun","knative.dev/traceid":"57fe491d-3862-41bf-9b53-116cff25ab80","knative.dev/key":"tyw-xxxx-api/1-3-57-b4fa7e32-70fd-475f-b7a4-00c45914ad80-41"}
{"level":"info","ts":"2021-10-18T09:03:51.851Z","logger":"tekton-pipelines-controller.event-broadcaster","caller":"record/event.go:282","msg":"Event(v1.ObjectReference{Kind:\"PipelineRun\", Namespace:\"tyw-xxxx-api\", Name:\"1-3-57-b4fa7e32-70fd-475f-b7a4-00c45914ad80-41\", UID:\"3c30044d-1b1b-41dc-b9dd-632c8bcb5269\", APIVersion:\"tekton.dev/v1beta1\", ResourceVersion:\"114893338\", FieldPath:\"\"}): type: 'Normal' reason: 'Running' Tasks Completed: 0 (Failed: 0, Cancelled 0), Incomplete: 1, Skipped: 0","commit":"584b527"}
{"level":"info","ts":"2021-10-18T09:03:51.921Z","logger":"tekton-pipelines-controller","caller":"controller/controller.go:538","msg":"Reconcile succeeded","commit":"584b527","knative.dev/controller":"github.com.tektoncd.pipeline.pkg.reconciler.pipelinerun.Reconciler","knative.dev/kind":"tekton.dev.PipelineRun","knative.dev/traceid":"57fe491d-3862-41bf-9b53-116cff25ab80","knative.dev/key":"tyw-xxxx-api/1-3-57-b4fa7e32-70fd-475f-b7a4-00c45914ad80-41","duration":0.202026368}
{"level":"info","ts":"2021-10-18T09:03:51.921Z","logger":"tekton-pipelines-controller.event-broadcaster","caller":"record/event.go:282","msg":"Event(v1.ObjectReference{Kind:\"PipelineRun\", Namespace:\"tyw-xxxx-api\", Name:\"1-3-57-b4fa7e32-70fd-475f-b7a4-00c45914ad80-41\", UID:\"3c30044d-1b1b-41dc-b9dd-632c8bcb5269\", APIVersion:\"tekton.dev/v1beta1\", ResourceVersion:\"114893338\", FieldPath:\"\"}): type: 'Normal' reason: 'Started' ","commit":"584b527"}
{"level":"info","ts":"2021-10-18T09:03:51.926Z","logger":"tekton-pipelines-controller","caller":"pipelinerun/pipelinerun.go:353","msg":"No.1 pipelineMeta is: &ObjectMeta{Name:3b8d0ae68a6fc5c95a91570e9528807c9eb2f164,GenerateName:,Namespace:tyw-xxxx-api,SelfLink:/apis/tekton.dev/v1beta1/namespaces/tyw-xxxx-api/pipelines/3b8d0ae68a6fc5c95a91570e9528807c9eb2f164,UID:29481e0e-19e9-43fb-b036-f2e87a71bc2f,ResourceVersion:114893337,Generation:1,CreationTimestamp:2021-10-18 09:03:51 +0000 UTC,DeletionTimestamp:
{"level":"info","ts":"2021-10-18T09:03:51.934Z","logger":"tekton-pipelines-controller","caller":"pipelinerun/pipelinerun.go:493","msg":"No.2 resolvePipelineState is: &[0xc001a29200]","commit":"584b527","knative.dev/controller":"github.com.tektoncd.pipeline.pkg.reconciler.pipelinerun.Reconciler","knative.dev/kind":"tekton.dev.PipelineRun","knative.dev/traceid":"57c1f38d-7b73-4987-bd00-448b305f16d2","knative.dev/key":"tyw-xxxx-api/1-3-57-b4fa7e32-70fd-475f-b7a4-00c45914ad80-41"}
{"level":"info","ts":"2021-10-18T09:03:51.944Z","logger":"tekton-pipelines-controller","caller":"pipelinerun/pipelinerun.go:771","msg":"Creating a new TaskRun object 1-3-57-b4fa7e32-70fd-475f-b7a4-00c45914ad80-41-3d8e52746c-wh66c for pipeline task 3d8e52746c39bdaf383fc1026d2ebd56816b9ca8","commit":"584b527","knative.dev/controller":"github.com.tektoncd.pipeline.pkg.reconciler.pipelinerun.Reconciler","knative.dev/kind":"tekton.dev.PipelineRun","knative.dev/traceid":"57c1f38d-7b73-4987-bd00-448b305f16d2","knative.dev/key":"tyw-xxxx-api/1-3-57-b4fa7e32-70fd-475f-b7a4-00c45914ad80-41"}
{"level":"info","ts":"2021-10-18T09:03:51.962Z","logger":"tekton-pipelines-controller","caller":"pipelinerun/pipelinerun.go:601","msg":"PipelineRun 1-3-57-b4fa7e32-70fd-475f-b7a4-00c45914ad80-41 status is being set to &{Succeeded Unknown {2021-10-18 09:03:51.962330786 +0000 UTC m=+440512.388238183} Running Tasks Completed: 0 (Failed: 0, Cancelled 0), Incomplete: 1, Skipped: 0}","commit":"584b527","knative.dev/controller":"github.com.tektoncd.pipeline.pkg.reconciler.pipelinerun.Reconciler","knative.dev/kind":"tekton.dev.PipelineRun","knative.dev/traceid":"57c1f38d-7b73-4987-bd00-448b305f16d2","knative.dev/key":"tyw-xxxx-api/1-3-57-b4fa7e32-70fd-475f-b7a4-00c45914ad80-41"}
{"level":"info","ts":"2021-10-18T09:03:51.963Z","logger":"tekton-pipelines-controller.event-broadcaster","caller":"record/event.go:282","msg":"Event(v1.ObjectReference{Kind:\"PipelineRun\", Namespace:\"tyw-xxxx-api\", Name:\"1-3-57-b4fa7e32-70fd-475f-b7a4-00c45914ad80-41\", UID:\"3c30044d-1b1b-41dc-b9dd-632c8bcb5269\", APIVersion:\"tekton.dev/v1beta1\", ResourceVersion:\"114893338\", FieldPath:\"\"}): type: 'Normal' reason: 'Running' Tasks Completed: 0 (Failed: 0, Cancelled 0), Incomplete: 1, Skipped: 0","commit":"584b527"}
{"level":"warn","ts":"2021-10-18T09:03:51.984Z","logger":"tekton-pipelines-controller","caller":"pipelinerun/pipelinerun.go:261","msg":"Failed to update PipelineRun labels/annotations{error 26 0 Operation cannot be fulfilled on pipelineruns.tekton.dev \"1-3-57-b4fa7e32-70fd-475f-b7a4-00c45914ad80-41\": the object has been modified; please apply your changes to the latest version and try again}","commit":"584b527","knative.dev/controller":"github.com.tektoncd.pipeline.pkg.reconciler.pipelinerun.Reconciler","knative.dev/kind":"tekton.dev.PipelineRun","knative.dev/traceid":"57c1f38d-7b73-4987-bd00-448b305f16d2","knative.dev/key":"tyw-xxxx-api/1-3-57-b4fa7e32-70fd-475f-b7a4-00c45914ad80-41"}
{"level":"info","ts":"2021-10-18T09:03:51.985Z","logger":"tekton-pipelines-controller.event-broadcaster","caller":"record/event.go:282","msg":"Event(v1.ObjectReference{Kind:\"PipelineRun\", Namespace:\"tyw-xxxx-api\", Name:\"1-3-57-b4fa7e32-70fd-475f-b7a4-00c45914ad80-41\", UID:\"3c30044d-1b1b-41dc-b9dd-632c8bcb5269\", APIVersion:\"tekton.dev/v1beta1\", ResourceVersion:\"114893338\", FieldPath:\"\"}): type: 'Warning' reason: 'Error' Operation cannot be fulfilled on pipelineruns.tekton.dev \"1-3-57-b4fa7e32-70fd-475f-b7a4-00c45914ad80-41\": the object has been modified; please apply your changes to the latest version and try again","commit":"584b527"}
{"level":"error","ts":"2021-10-18T09:03:52.047Z","logger":"tekton-pipelines-controller","caller":"pipelinerun/reconciler.go:294","msg":"Returned an error","commit":"584b527","knative.dev/controller":"github.com.tektoncd.pipeline.pkg.reconciler.pipelinerun.Reconciler","knative.dev/kind":"tekton.dev.PipelineRun","knative.dev/traceid":"57c1f38d-7b73-4987-bd00-448b305f16d2","knative.dev/key":"tyw-xxxx-api/1-3-57-b4fa7e32-70fd-475f-b7a4-00c45914ad80-41","targetMethod":"ReconcileKind","error":"1 error occurred:\n\t Operation cannot be fulfilled on pipelineruns.tekton.dev \"1-3-57-b4fa7e32-70fd-475f-b7a4-00c45914ad80-41\": the object has been modified; please apply your changes to the latest version and try again\n\n","stacktrace":"github.com/tektoncd/pipeline/pkg/client/injection/reconciler/pipeline/v1beta1/pipelinerun.(reconcilerImpl).Reconcile\n\tgithub.com/tektoncd/pipeline/pkg/client/injection/reconciler/pipeline/v1beta1/pipelinerun/reconciler.go:294\nknative.dev/pkg/controller.(Impl).processNextWorkItem\n\tknative.dev/pkg@v0.0.0-20210331065221-952fdd90dbb0/controller/controller.go:530\nknative.dev/pkg/controller.(Impl).RunContext.func3\n\tknative.dev/pkg@v0.0.0-20210331065221-952fdd90dbb0/controller/controller.go:468"}
{"level":"error","ts":"2021-10-18T09:03:52.047Z","logger":"tekton-pipelines-controller","caller":"controller/controller.go:548","msg":"Reconcile error","commit":"584b527","knative.dev/controller":"github.com.tektoncd.pipeline.pkg.reconciler.pipelinerun.Reconciler","knative.dev/kind":"tekton.dev.PipelineRun","duration":0.126541491,"error":"1 error occurred:\n\t Operation cannot be fulfilled on pipelineruns.tekton.dev \"1-3-57-b4fa7e32-70fd-475f-b7a4-00c45914ad80-41\": the object has been modified; please apply your changes to the latest version and try again\n\n","stacktrace":"knative.dev/pkg/controller.(Impl).handleErr\n\tknative.dev/pkg@v0.0.0-20210331065221-952fdd90dbb0/controller/controller.go:548\nknative.dev/pkg/controller.(Impl).processNextWorkItem\n\tknative.dev/pkg@v0.0.0-20210331065221-952fdd90dbb0/controller/controller.go:531\nknative.dev/pkg/controller.(Impl).RunContext.func3\n\tknative.dev/pkg@v0.0.0-20210331065221-952fdd90dbb0/controller/controller.go:468"}
{"level":"info","ts":"2021-10-18T09:03:52.048Z","logger":"tekton-pipelines-controller.event-broadcaster","caller":"record/event.go:282","msg":"Event(v1.ObjectReference{Kind:\"PipelineRun\", Namespace:\"tyw-xxxx-api\", Name:\"1-3-57-b4fa7e32-70fd-475f-b7a4-00c45914ad80-41\", UID:\"3c30044d-1b1b-41dc-b9dd-632c8bcb5269\", APIVersion:\"tekton.dev/v1beta1\", ResourceVersion:\"114893338\", FieldPath:\"\"}): type: 'Warning' reason: 'InternalError' 1 error occurred:\n\t* Operation cannot be fulfilled on pipelineruns.tekton.dev \"1-3-57-b4fa7e32-70fd-475f-b7a4-00c45914ad80-41\": the object has been modified; please apply your changes to the latest version and try again\n\n","commit":"584b527"}
Reopening as this is seen in recent releases 😞
PR #4361 must fix this issue 🤞
This should be fixed by https://github.com/tektoncd/pipeline/pull/4361 which will make any attempt to create a new taskrun fail.
Hi,
I have a simple pipeline that consists only of a single task. When creating a PipelineRun for it, I expect that one TaskRun is created. Sometimes however, I observe two TaskRuns being created for the same PipelineRun. Here is an example: PipelineRun
uiveri5-lunchmenu-1598256551145
resulted in two TaskRunsuiveri5-lunchmenu-1598256551145-uiveri5-d7fkh
anduiveri5-lunchmenu-1598256551145-uiveri5-vllbc
.Is my expectation right that only one TaskRun should be created? Any ideas what could lead to this situation?
Thank you!
tekton-pipelines-controller logs
pipeline
Additional Info
Kubernetes version:
Tekton Pipeline version: