tektoncd / pipeline

A cloud-native Pipeline resource.
https://tekton.dev
Apache License 2.0
8.48k stars 1.78k forks source link

Add support for referencing Tasks in git #2298

Closed bobcatfish closed 1 year ago

bobcatfish commented 4 years ago

Expected Behavior

In #1839 we are adding support for referencing Tasks that are stored in OCI image repos. We should also be able to reference Tasks that are stored in git, e.g.

apiVersion: tekton.dev/v1alpha1
kind: TaskRun
metadata:
  name: my-task-run
spec:
  taskRef:
    git:
      url: https://github.com/my/repo
      commit: deadbeef
      path: path/to/my/task.yaml

Actual Behavior

Via https://github.com/tektoncd/pipeline/issues/1839 we now support referencing versioned Tasks and Pipelines in OCI registries (https://github.com/tektoncd/pipeline/blob/master/docs/pipelines.md#tekton-bundles).

Use case

For example I could make a TriggerTemplate like:

apiVersion: triggers.tekton.dev/v1alpha1
kind: TriggerTemplate
metadata:
  name: run-tests
spec:
  params:
  - name: commitish
    description: The commitish to grab the Pipeline from to run
    default: master
  resourcetemplates:
  - apiVersion: tekton.dev/v1beta1
    kind: PipelineRun
    metadata:
      generateName: run-tests-$(uid)-
    spec:
      pipelineRef:
        git:
          url: https://github.com/my/repo
          # This would let me include any changes to the Pipeline in the PR testing
          commit: $(params.commitish)
          path: path/to/my/pipeline.yaml

Additional Info

vdemeester commented 4 years ago

/kind feature

pierretasci commented 4 years ago

Just seeing this now. I think this is very doable without much additional overhead. Definitely worth a look into. There is also KPT (https://github.com/GoogleContainerTools/kpt) which promises something similar.

tekton-robot commented 4 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

/close

Send feedback to tektoncd/plumbing.

tekton-robot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot commented 4 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with /close.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

tekton-robot commented 4 years ago

@tekton-robot: Closing this issue.

In response to [this](https://github.com/tektoncd/pipeline/issues/2298#issuecomment-673815376): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >/close > >Send feedback to [tektoncd/plumbing](https://github.com/tektoncd/plumbing). Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
vdemeester commented 4 years ago

/remove-lifecycle rotten /remove-lifecycle stale /reopen

tekton-robot commented 4 years ago

@vdemeester: Reopened this issue.

In response to [this](https://github.com/tektoncd/pipeline/issues/2298#issuecomment-674776905): >/remove-lifecycle rotten >/remove-lifecycle stale >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
bobcatfish commented 4 years ago

Added some more detail to the description of this after I accidentally made a duplicate XD

vdemeester commented 4 years ago

So there is one thing that worries me is that, it would open the door for a lot of "ways" to reference definitions.

One general thought on this too : the more we add ways to refer definitions, the less portable definition types (Pipeline, Task) becomes.

One question I am asking myself is, how can we support your usecase (which is more or less Pipeline as code) as of today, and/or without having this feature in (but other features in) — for exploration purposes. As of today there is a bunch of possibility already:

Without this feature, but with some changes:

In a gist, I, initially, see more problems of having this in pipeline than not. But as lot's of people said : No is temporary, Yes is forever.

/cc @tektoncd/core-maintainers @chmouel

bobcatfish commented 3 years ago

For example, why do we support git and not mercurial, or darcs, or other vcs ?

That's a good question - I think we'd have to choose which to support.

My current thought is that we start with git support and add others as people ask for them. Usually I've tried to prioritize making things like this pluggable; do we feel like that's important here? I feel like version control is so fundamental to CI/CD use cases that it's reasonable to have some built in version control support?

How much complexity does it add to the resolving part for this ? Is this complexity worth putting in pipeline or could this be solved by another component ? Especially in terms of auth and secret management — both on the implementation complexity and the usage complexity.

Do you see this adding a lot of complexity? We've currently got git-init, and it seems like we do add features to it from time to time, but it seems worth the cost?

Are we talking about engineering effort, interface, or the time it takes to fetch the resources? Fetching would only happen once per reconcile and we could add caching if we wanted. The interface might be interesting to design but we've already had some experience via git-init.

How do we envision this for the tasks used by this Pipeline then (if the task definition also lives in the git repository) ?

Good question - I could see this being up to the author. You could either parameterize the Task refs such that they use the same git commits, or you could refer to them in your cluster, or you could refer to them in an OCI registry.

How to make the Pipeline usable locally (without triggers) ?

You'd provide params for the location of the git repo + the commit (maybe this gets back to the request in TEP-0018 to have a default bundle - maybe we'd want a default/parametrizable git repo as well in this case - but to start with id say folks could use params to specify this)

the more we add ways to refer definitions, the less portable definition types (Pipeline, Task) becomes.

Could you explain more about that? I think referring to Pipelines and Tasks within a cluster might be even worse (e.g. if someone deletes/changes a definition in the cluster)

Without this feature, but with some changes

Both seem like they would work but being able to refer to Tasks and Pipelines where they live in version control seems like a a simple elegant solution that would require a lot less to get up and running and reason about once it's running.

If myself or someone else is able to make a POC at some point that might help.

vdemeester commented 3 years ago

My current thought is that we start with git support and add others as people ask for them. Usually I've tried to prioritize making things like this pluggable; do we feel like that's important here? I feel like version control is so fundamental to CI/CD use cases that it's reasonable to have some built in version control support?

Some questions : Is the use of tektoncd/pipeline limited to version control ? the current answer is definitively no. Should it ? I would answer no here too, tektoncd/pipeline should be as less opiniated as it can in terms of "where data comes from and where it goes".

Do you see this adding a lot of complexity? We've currently got git-init, and it seems like we do add features to it from time to time, but it seems worth the cost?

Are we talking about engineering effort, interface, or the time it takes to fetch the resources? Fetching would only happen once per reconcile and we could add caching if we wanted. The interface might be interesting to design but we've already had some experience via git-init.

I am talking about engineering effort, and usage/interface (user experience, verbosity, …), not on fetching resources, it's an implementation detail — which, even though a detail, it should be once for a specific run, not per reconcile (just as it should be for normal fetch and oci bundles), then we should use the object itself to look at the definition, it is less racey.

How do we envision this for the tasks used by this Pipeline then (if the task definition also lives in the git repository) ?

Good question - I could see this being up to the author. You could either parameterize the Task refs such that they use the same git commits, or you could refer to them in your cluster, or you could refer to them in an OCI registry.

Well that's the "problem", it has to be simple for the user. If the user has to adapt its pipeline to be able to parametrize this, it makes the pipeline less shareable (kinda) — very similar to what current PipelineResource design does when you use the GitResource for example, you are stuck with it, you have to write another Task (in case of GitResource) to use another input than a git repository.

How to make the Pipeline usable locally (without triggers) ?

You'd provide params for the location of the git repo + the commit (maybe this gets back to the request in TEP-0018 to have a default bundle - maybe we'd want a default/parametrizable git repo as well in this case - but to start with id say folks could use params to specify this)

By locally, I mean "from the source on my laptop" to a running pipeline, without doing any commit. Right now, it is possible through tooling : I apply my definition, I find a way to populate a workspace with my data (volume, …) and I run my Pipeline with it. If the Pipeline user a git reference notation to get tasks definition, how do I test locally changes to my task(s) ? except by changing the pipeline definition itself ?

the more we add ways to refer definitions, the less portable definition types (Pipeline, Task) becomes.

Could you explain more about that? I think referring to Pipelines and Tasks within a cluster might be even worse (e.g. if someone deletes/changes a definition in the cluster)

The more choice you have to refer to something, the more "matrix" of problem you encounter. The example above is one of them. It's not about where we refer things from, but how much possible ways we have to refer things from and how this affect the ability to author and share task/pipeline/….

Note that, in this reflexion, I am only looking from the tektoncd/pipeline point of view, not a full fledge CI/CD system (which tektoncd/pipeline is not, it's just a component). "Should tektoncd/pipeline be a full fledge CI/CD system ?" is a question, that we may want to discuss too. "Should tektoncd provide a full fledge CI/CD system ?" is another question.

Note that, just like with PipelineResource, I am trying to make use think and discuss really hard before implementing new features in tektoncd/pipeline if they are solvable by other components

bobcatfish commented 3 years ago

I am trying to make use think and discuss really hard before implementing new features in tektoncd/pipeline if they are solvable by other components

Excellent! I hope we can be as rigorous with all the new features we add :D

Responding to you inline has helped me come up with a different way to present this feature which I think might help!

I want to assert that:

  1. Version control is a key element of continuous delivery (and CI)
  2. (1) is not limited to just your source code, but also your configuration (i.e. "keep absolutely everything in version control")

Both of these recommendations are backed up by every canonical piece of literature in the space I've encountered, so given that our mission is to create components for CI/CD:

  1. I think it's reasonable to assume that most of the time there is version control involved in the activities of or surrounding execution of Pipelines and Tasks
  2. The best practice we should recommend and support is for people to store their Pipeline and Task definitions in version control

We currently only support getting the definitions in (2) from a cluster or from an OCI registry.

This means that (if you agree with the above!) although we know folks will be storing these definitions in the version control, we're saying they need to do something with those definitions before they can use them, i.e. apply them to a cluster or upload them to a registry.

The feature I'm proposing here is to recognize that folks will be storing these definitions in version control, and not require that they have to then do some extra thing with them to use them. (And this elegantly solves some other use cases like using changes to the Pipelines and Tasks that are made in the same PR.)

(I also assert that referencing Tasks and Pipelines in cluster actually buys us very little - which we have especially seen with needing to add CRD types like ClusterTask - which even then don't give us the scoping we want)

if the user has to adapt its pipeline to be able to parametrize this, it makes the pipeline less shareable (kinda)

I'm not sure how this is worse than the current state: you can create Pipelines that refer to Tasks that only exist in your own cluster or your own OCI registry.

If the Pipeline user a git reference notation to get tasks definition, how do I test locally changes to my task(s) ? except by changing the pipeline definition itself ?

Today if you're making changes to a Task, you need to either apply it to your cluster or upload it to the registry, right?

If you apply an updated Task to the cluster (presumably your own private cluster), you have to name it the same as the Pipeline is expecting or edit the Pipeline. If you upload it to the registry, you have to either use the same name/label/version as the Pipeline is changing, or update the Pipeline.

If we had support for referring to Tasks in git, supporting this scenario via requiring a change to the Pipeline doesn't seem much different to me?

To me this points more toward having some kinda "local mode", maybe via the CLI, which is able look for Task and Pipeline definitions on the filesystem (currently not supported at all) - which would probably require being able to override at runtime where Tasks and Pipelines are pulled from - something we might want to consider even without version control support.

afrittoli commented 3 years ago

If we consider the location where you fetch a task / pipeline from a runtime concern, I think supporting multiple sources would not hinder reusability of tasks and pipelines.

When running a pipelinerun/taskrun one has to specify the pipeline/task ref, which could be cluster, OCI, git... and perhaps even path in a workspace?

For pipeline tasks, the task name is part of the pipeline definition, but where to look for that could again be a runtime concern.

vdemeester commented 3 years ago

This means that (if you agree with the above!) although we know folks will be storing these definitions in the version control, we're saying they need to do something with those definitions before they can use them, i.e. apply them to a cluster or upload them to a registry.

The feature I'm proposing here is to recognize that folks will be storing these definitions in version control, and not require that they have to then do some extra thing with them to use them. (And this elegantly solves some other use cases like using changes to the Pipelines and Tasks that are made in the same PR.)

(I also assert that referencing Tasks and Pipelines in cluster actually buys us very little - which we have especially seen with needing to add CRD types like ClusterTask - which even then don't give us the scoping we want)

I agree with the definitions above, but it applies to a system (a CI system). tektoncd/pipeline is a component not a full CI system and I see handling this case — definitions are in a version control — as a responsibility of the system, not necessarily the tektoncd/pipeline component.

To try to make my point a bit clearer, I want to make a small parallel with Pod and Deployment here. A Pod doesn't have to support all the concerns, for example, a Pod (spec) doesn't have anything related to livenessProbes, because it is not its concerns, it's the Deployment concern. I feel some feature (like this one) might not be under the tektoncd/pipeline API and be better achieved by tooling or higher level constructs.

Of course, as a user, I expect to use a CI/CD system (or build one) that allows me to store my definitions, etc., in a version control system. But it doesn't mean it has to be supported by tektoncd/pipeline instead of something else (another component part of my CI system).

Which brings me back again on I am only looking from the tektoncd/pipeline point of view, not a full fledge CI/CD system (which tektoncd/pipeline is not, it's just a component). "Should tektoncd/pipeline be a full fledge CI/CD system ?" "Should Tekton (the community, the tektoncd org) provide a full fledge CI/CD system ?" is another one.

Today if you're making changes to a Task, you need to either apply it to your cluster or upload it to the registry, right?

If you apply an updated Task to the cluster (presumably your own private cluster), you have to name it the same as the Pipeline is expecting or edit the Pipeline. If you upload it to the registry, you have to either use the same name/label/version as the Pipeline is changing, or update the Pipeline.

If we had support for referring to Tasks in git, supporting this scenario via requiring a change to the Pipeline doesn't seem much different to me?

It really depends on the tool you used :wink:. If I use a tool that bundles everything into taskSpec and pipelineSpec, I never update any task on my cluster, I edit my yaml, and I let the tool do its thing. This is what tekton-asa-code does for example.

For pipeline tasks, the task name is part of the pipeline definition, but where to look for that could again be a runtime concern.

I also tend to agree with @afrittoli, that maybe where to look could be a runtime concern (with failover mechanisms, …).

Does where to look for the definition needs to be something an Pipeline/Task definition author has to take care of ? Should it even be there ? (and on this, I am glad the bundle are hidden under feature-flag still) This is, imo, a critical question to answer, because depending on it, we may not need the bundle part in Pipeline for example — and we would need to generalize the approach taken in TEP-0018 Allow a Run to Specify a Default Bundle proposal.

To me this points more toward having some kinda "local mode", maybe via the CLI, which is able look for Task and Pipeline definitions on the filesystem (currently not supported at all) - which would probably require being able to override at runtime where Tasks and Pipelines are pulled from - something we might want to consider even without version control support.

This is more like that, this might make "version control support" not necessary at all.

As a summary, I feel answering the following question are very important to be able to discuss this, and other tektoncd/pipeline, feature(s):

vdemeester commented 3 years ago
  • Do we consider tektoncd/pipeline to be a full fledge CI/CD system ? Should it ?
  • Should Tekton (the community, the tektoncd org) provide a full fledge CI/CD system ?

Note that, those need to be answered and clearly stated on tekton.dev, our community repository, … to try to make users discovering tekton not having the wrong impressions :upside_down_face:

bobcatfish commented 3 years ago

I agree with your list @vdemeester ! And I like the approach of answering those questions and then maybe later returning to this particular issue later (instead of jumping to the conclusion that folks should be able to reference Tasks and Pipelines in git directly in Runs).

Does where to look for the definition needs to be something an Pipeline/Task definition author has to take care of ? Should it even be there ?

I think this is really interesting and I think you're totally right that putting bundles behind a feature flag will help us here :D

Maybe we can start working on a TEP that outlines the problem here: the runtime vs. authoring time concerns around where Tasks and Pipelines are referenced

Do we consider tektoncd/pipeline to be a full fledge CI/CD system ? Should it ?

It'd be great if we could start trying to draw lines of individual responsibility around the components in Tekton.

Should Tekton (the community, the tektoncd org) provide a full fledge CI/CD system ?

Based on some conversations with @ImJasonH yesterday I have a couple ideas that might help with this:

Anyway feels like these probably deserve their own separate issues and TEPs, and maybe we can put this particular issue on hold as we dig through them!

michaelsauter commented 3 years ago

I came across this issue as I was looking for a solution to run pipelines defined in a Git repository. Since there seems to be no solution so far, I tried to come up with a simple POC how to do this.

Some background on the context I'm coming from: the tools in place right now are Bitbucket, Jenkins and OpenShift. For each Bitbucket project (with some repos), there is one OpenShift namespace with a Jenkins instance. Bitbucket sends webhook request to a custom service in the corresponding OpenShift namespace, which creates/triggers OpenShift BuildConfig resources which are automatically synced to pipelines in Jenkins (via the now deprecated OpenShift integration). I'm looking to potentially replace Jenkins with OpenShift Pipelines (Tekton).

I got the approach for my POC when I read that a parameter substitution can happen anywhere in TriggerTemplate resources. With this in mind, I decided to write a custom interceptor that would:

The response of the interceptor is then turned into params through a TriggerBinding, and the TriggerTemplate uses the pipeline name parameter (as set by the interceptor) in the pipelineRef field. With this, the pipeline run reflects the pipeline in the Git repository, and the OpenShift UI allows to see all the pipeline runs for a given branch in a nice UI (so one sees e.g. trends in duration / success).

It turned out to be relatively easy to implement, and seems to work well. One issue I've noticed is that the TriggerTemplate needs to provision something to back the workspace, so either one uses a volume claim template (which leads to many volumes being created) or one references a specific existing PVC (which prevents pipelines from running in parallel - this could partially be solved by having one PVC per repo, and pass this down to the TriggerTemplate from the custom interceptor like the pipeline name).

Apart from the issue I mentioned already, is this a direction that generally makes sense? Did you already look into solving this issue not in "Tekton Pipelines" but instead in the "Tekton triggers area"?

FogDong commented 3 years ago

Can we move this discussion forward? Since that we have already supported getting pipelines from OCI repository, I feel a bit unreasonable for the idea of separating Git support to another repository. I do think that the pipeline repo is a CI module but a CI system, but since that the version control stands a huge part of CI, the pipeline git support might be important. Take GitHub Action as an example, every time users pull their requests, the in-tree pipeline can be run to test the code. And the backend pipeline can also be tekton pipelines. The whole CI process should be smooth, if we put this feature into another repo, the experience may not be great.

bobcatfish commented 3 years ago

@FogDong there's been some related movement in this proposal: https://github.com/tektoncd/community/pull/341 (might be similar to what @michaelsauter is proposing also)

jstrachan commented 3 years ago

BTW here's the solution we've been using in the Jenkins X community to workaround there being no native support yet for referencing tasks + steps in git and overriding them... https://jenkins-x.io/blog/2021/02/25/gitops-pipelines/

we're using the ko and mink trick of using a custom image URI for now.

it would obviously be better to add this explicitly into the tekton CRDs some day.

The part we've really found useful is being able to just reuse all steps in a task; a named step or all named steps and adding customisations before/after/between the steps and to override steps too.

So its basically a (purposely) simple overlay mechanisms where we can import steps from tasks referenced in git and override them locally.

vdemeester commented 3 years ago

BTW here's the solution we've been using in the Jenkins X community to workaround there being no native support yet for referencing tasks + steps in git and overriding them... https://jenkins-x.io/blog/2021/02/25/gitops-pipelines/

we're using the ko and mink trick of using a custom image URI for now.

it would obviously be better to add this explicitly into the tekton CRDs some day.

Would it be though ? (in tektoncd/pipeline CRD I mean). We are back into the following questions

The fact that this is "not" supported in tektoncd/pipeline today (only reference in-cluster, reference by oci ref or embedded spec are supported) allow tools and product using some tektoncd components to experiment with their solution, use what works the best with them. The more "opinion" we put into the core (tektoncd/pipeline) the less it is a component, the more it is a product.

Overall, I am all for supporting this in Tekton (aka in a project in tektoncd), but I am a bit worried to support this in the tektoncd/pipeline component, at least for now.

The part we've really found useful is being able to just reuse all steps in a task; a named step or all named steps and adding customisations before/after/between the steps and to override steps too.

So its basically a (purposely) simple overlay mechanisms where we can import steps from tasks referenced in git and override them locally.

Reading https://jenkins-x.io/blog/2021/02/25/gitops-pipelines/, right now you are abusing image and stepTemplate to be able to pick up all task or some tasks from another Task definition (be it in-cluster, on the catalog, …), am I right ? I feel this is/was the authoring part of https://github.com/tektoncd/community/pull/316 (cc @bobcatfish) but I do like that approach, it seems like a very lightweight and "customizable" replacement for PipelineResources (cc @jerop)

tekton-robot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale with a justification. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot commented 3 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten with a justification. Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

bobcatfish commented 3 years ago

Discussion has been continuing around this topic via the experimental workflows project (and related projects such as pipeline as code and WG, and also via TEP-0060 remote resolution (and https://github.com/tektoncd/community/pull/493)

/lifecycle frozen /remove-lifecycle rotten

abayer commented 2 years ago

/assign

This will be integrated into Pipeline as part of #4710.

lbernick commented 1 year ago

Closed by #4710