tektoncd / pipeline

A cloud-native Pipeline resource.
https://tekton.dev
Apache License 2.0
8.46k stars 1.77k forks source link

Tasks that claim to emit results but don't should fail #3497

Open bobcatfish opened 3 years ago

bobcatfish commented 3 years ago

Expected Behavior

If I create a Task which claims to emit a result, but it doesn't, the TaskRun should fail, i.e. I should be able to rely on the interface a Task claims to provide.

Actual Behavior

This will only be considered a failure if something in the Pipeline tries to use the result; if nothing tries to use the result, there will be no error, but if something does try to use the result you will get an error like:

tkn pr list
NAME                     STARTED          DURATION    STATUS
sum-three-pipeline-run   10 minutes ago   5 seconds   Failed(InvalidTaskResultReference)

Steps to Reproduce the Problem

I took this example pipeline with results and made a few changes:

apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: add-task
spec:
  params:
    - name: first
      description: the first operand
    - name: second
      description: the second operand
  results:
    - name: sum
      description: the sum of the first and second operand
  steps:
    - name: add
      image: alpine
      env:
        - name: OP1
          value: $(params.first)
        - name: OP2
          value: $(params.second)
      command: ["/bin/sh", "-c"]
      args:
        - echo -n $((${OP1}+${OP2}))
---
apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
  name: sum-three-pipeline
spec:
  params:
    - name: first
      description: the first operand
    - name: second
      description: the second operand
    - name: third
      description: the third operand
  tasks:
    - name: first-add
      taskRef:
        name: add-task
      params:
        - name: first
          value: $(params.first)
        - name: second
          value: $(params.second)
    - name: second-add
      taskRef:
        name: add-task
      params:
        - name: first
          value: "5"
        - name: second
          value: $(params.third)
---
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
  name: sum-three-pipeline-run
spec:
  pipelineRef:
    name: sum-three-pipeline
  params:
    - name: first
      value: "2"
    - name: second
      value: "10"
    - name: third
      value: "10"

If you apply this, you will see that it runs successfully. If you add any references to the results, the run will fail.

Additional Info

ghost commented 3 years ago

FYI in https://github.com/tektoncd/pipeline/pull/3472 we're removing the behaviour where a PipelineRun is put into a failed state due to a Task dropping a result. Feedback welcome!

Edit for posterity: I was wrong - a PipelineRun was never put into a failed state due to a Task dropping a result. Only if the PipelineRun couldn't fetch the assocaited PipelineSpec while trying to resolve those result references. Ignore this message!

pritidesai commented 3 years ago

thanks @bobcatfish, https://github.com/tektoncd/pipeline/issues/2701 has some discussion as well

afrittoli commented 3 years ago

I feel like this depends on what we define with "producing a result". I think with the current code today is:

That said, it is probably good to continue to force tasks to produce a result like they do today (unless we introduce support for default values, and a task specifies a default), as this would help task authors and users to avoid silent failures from being ignored.

tekton-robot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale with a justification. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot commented 3 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten with a justification. Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

ghost commented 3 years ago

@bobcatfish do you still want to pursue this one or should we leave it for the time being?

ghost commented 3 years ago

Note: with the new feature gate / flagging work this would be classified as a backwards incompatible change and would therefore need its own feature flag. It could be made the default for a new apiVersion, such as v1.

bobcatfish commented 3 years ago

Feels like we are still not 100% sure what the desired behavior is here so I'd like to keep discussing

/remove-lifecycle rotten

tekton-robot commented 2 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale with a justification. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

bobcatfish commented 2 years ago

I'd still like to bring this forward at some point and decide how to address this as a community - maybe as part of v1 discussions? but it hasnt been a priority yet :(

/remove-lifecycle stale

jerop commented 2 years ago

In TEP-0075, we are planning to support object results. These results embrace the optional by default behavior of JSON Schema. A task can produce an object result with a missing attribute and it won't fail. However, a subsequent task will fail if it attempts to use the missing attribute from the object result, causing the pipeline to fail. That makes the object results attributes optional if not used, and required if used. That behavior is consistent with the existing behavior reported in this issue as a bug, so wondering if we should document this behavior and close this issue?

bobcatfish commented 2 years ago

I'm still leaning toward failing tasks that claim to emit results but fail and I'll try to explain why.

The Task spec defines the interface of the task, including:

In TEP-0075 we add support for results to declare they are of type object and we start to support defining the schema of that object, using JSON schema.

The crux of the problem is, what does it mean if a Task declares it will produce a result with a JSON schema indicating it will have key "foo" but it does not have that key at runtime? JSON schema allows for keys to be declared as "required" and assumes all keys are optional by default. So if a Task declares that it will produce a result with the key "foo" and it does not, it is still technically meeting the contract it declared in that the schema it defined did say that key was optional 😭

However I think it is different when we are talking about the results themselves; I don't think the same logic we apply to each individual result that a Task declares it produces needs to the same as the application of the schema we apply to validate those results. imo it all comes down to: did the Task fulfill the contract it declared. If it said it would make a result and did not make that result, I believe it did not fulfill its contract.

So in the example of a Task declaring a result with a schema indicating it is an object with (implied optional) key foo, I would think:

All of that being said I can see why this would be confusing 😅 maybe we can find a compromise in TEP-0075 that manages to balance both worlds, I'm going to make a suggestion over there.

jerop commented 2 years ago

@bobcatfish thank you for the detailed discussion above - makes sense to distinguish between producing results and keys in object results - especially given that users will have control to say that they keys are required if they wish

agreed, we can go ahead with making the results themselves required - we'll need to explain this well in the docs though (cc @vinamra28)

pritidesai commented 2 years ago

/assign @vinamra28

tekton-robot commented 2 years ago

@pritidesai: GitHub didn't allow me to assign the following users: vinamra28.

Note that only tektoncd members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. For more information please see the contributor guide

In response to [this](https://github.com/tektoncd/pipeline/issues/3497#issuecomment-1116306899): >/assign @vinamra28 Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
vinamra28 commented 2 years ago

/assign

pritidesai commented 2 years ago

@vinamra28 is implementing TEP-0048 which addresses this issue

dibyom commented 1 year ago

related: https://github.com/tektoncd/pipeline/pull/6817

afrittoli commented 1 year ago

As discussed during the API WG on 17.07, moving this to nice-to-have for v1, in favour of https://github.com/tektoncd/pipeline/issues/6932