tektoncd / pipeline

A cloud-native Pipeline resource.
https://tekton.dev
Apache License 2.0
8.37k stars 1.76k forks source link

Allow volume from volumeClaimTemplate to be used in more than one workspace #3440

Open jlpettersson opened 3 years ago

jlpettersson commented 3 years ago

Feature request

Improve usability when multiple workspaces share the same volume (created from volumeClaimTemplate).

Use case

It is a good practice for CI/CD pipelines to contain two git-repos, one for the source code and one for the configuration (ref Continuous Delivery book)

This use case is also requested in https://github.com/tektoncd/catalog/issues/444#issuecomment-663712885 and I have designed a few pipelines following this pattern.

However, the usability for this today is not very good.

Current Situation

Proposal

Introduce a field for PipelineRun-volume In the PipelineRun.spec add a new field, something like:

volumeClaimTemplate:
  metadata:
     name: pipelinerun-vol
  spec:
    accessModes: 
    - ReadWriteOnce
    resources:
      requests:
        storage: 1Gi

And then let multiple workspaces use the same volume with a subPath in the WorkspaceBinding:

workspaces:
- name: source-code-ws
  persistentVolumeClaim:
    claimName: pipelinerun-vol
  subPath: source-code
 - name:  configuration-ws
  persistentVolumeClaim:
    claimName: pipelinerun-vol
  subPath: configuration

....and when binding those, that we first look if the PipelineRun has a volumeClaimTemplate mapped with a matching name.

The improvement

The PipelineRun example with different subPaths works ... but is difficult to get to:

apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
  name: pipeline-using-different-subpaths
spec:
  workspaces:
    - name: ws
  tasks:
    - name: writer-1
      taskRef:
        name: git-clone
      workspaces:
        - name: task-ws
          workspace: ws
          subPath: source-code-dir
    - name: writer-2
      runAfter:
        - writer-1
      taskRef:
        name: git-clone
      workspaces:
        - name: task-ws
          workspace: ws
          subPath: configuration-dir
    - name: read-all
      runAfter:
        - writer-2
      params:
        - name: directory1
          value: dir-1
        - name: directory2
          value: dir-2
      taskRef:
        name: read-both
      workspaces:
        - name: local-ws
          workspace: ws

If this proposal is implemented, the subPaths is only used in WorkspaceBinding-part, and the two workspaces can have different names

apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
  name: pipeline-using-different-subpaths
spec:
  workspaces:
    - name: source-code-ws
    - name: configuration-ws
  tasks:
    - name: writer-1
      taskRef:
        name: git-clone
      workspaces:
        - name: task-ws
          workspace: source-code-ws
    - name: writer-2
      runAfter:
        - writer-1
      taskRef:
        name: git-clone
      workspaces:
        - name: task-ws
          workspace: configuration-ws
    - name: read-all
      runAfter:
        - writer-2
      params:
        - name: directory1
          value: dir-1
        - name: directory2
          value: dir-2
      taskRef:
        name: read-both
      workspaces:
        - name: ws-1
          workspace: source-code-ws
        - name: ws-2
          workspace: configuration-ws

The PipelineRun for the former:

apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
  generateName: pr-different-path-
spec:
  pipelineRef:
    name: pipeline-using-different-subpaths
  workspaces:
    - name: ws
      volumeClaimTemplate:
        spec:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 1Gi

and for the latter

apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
  generateName: pr-different-path-
spec:
  pipelineRef:
    name: pipeline-using-different-subpaths
  workspaces:
    - name: source-code-ws
       persistentVolumeClaim:
         claimName: pipelinerun-vol
       subPath: source-code
     - name: configuration-ws
       persistentVolumeClaim:
         claimName: pipelinerun-vol
       subPath: configuration
  volumeClaimTemplate:
     metadata:
       name: pipelinerun-vol
    spec:
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 1Gi

The improvement is even more clear when the workspace is used in many tasks within the Pipeline.

Note

This "design" was discussed when the volumeClaimTemplate was introduced, but at least me did not prefer it - at the time - now I think it was a mistake.

Also see how StatefulSet handle the name of the volumeClaimTemplate and the mapping to volume name.

ghost commented 3 years ago

Your final example with the PipelineRun spec volumeClaimTemplate looks very similar to this, which is supported today:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pipelinerun-vol
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
---
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
  generateName: pr-different-path-
spec:
  pipelineRef:
    name: pipeline-using-different-subpaths
  workspaces:
    - name: source-code-ws
       persistentVolumeClaim:
         claimName: pipelinerun-vol
       subPath: source-code
     - name: configuration-ws
       persistentVolumeClaim:
         claimName: pipelinerun-vol
       subPath: configuration

I think the one major difference here would be that the volumeClaimTemplate has an owner reference to the pipelinerun and will be deleted with it. But there may be other advantages too?

I have a bit of concern about adding a kubernetes-specific field to the top-level of the PipelineRunSpec type. As much as possible we're trying to keep k8s-specific stuff isolated to specific areas of the API if they have to be added at all.

I'm starting to think that we need to look at the problem of inter-Node Workspaces more \holistically (octopus arms added to try and disarm the heaviness of that word). There is a ton of complexity here, a lot of different user/org preferences, and Tekton's surface area could bloat a lot if we tried to tackle each problem in isolation. This in turn could result in some nightmarish situations for platform implementers who then have to figure out support paths for every permutation of volume configuration that a user or tool might generate.

bobcatfish commented 3 years ago

Would it be accurate to say another way of rewording this feature request is to make it possible to specify a volume claim template at runtime and pass it to more than one workspace?

I tend to agree with @sbwsg that this is addressed today by declaring a PVC separately - @jlpettersson is the missing piece for you that the PVC won't get automatically cleaned up at the end of the PipelineRun in that scenario?

A PipelineRun should not use more than one PersistentVolumeClaim (since it does not work well with how Availability Zones and multiple Nodes work)

I disagree with this specific assertion: I think it should be fine for a PipelineRun to use more than one PVC, if the user wants to, and I think we should allow it, even if it requires some work for the user to make it work (I believe by default, using the affinity assistant, we do not allow this?)

So I 👍 to @sbwsg 's assessment to examine this holistically.

It is a good practice for CI/CD pipelines to contain two git-repos, one for the source code and one for the configuration (ref Continuous Delivery book)

@jlpettersson do you happen to have the page / section where the book recommends 2 repos? it definitely recommends putting everything in version control but I'm having trouble finding where it recommends 2 repos (and I'm very curious about this since personally I find keeping configuration close to the thing being deployed works best - but if you have 1 set of configuration for services that span multiple repos i can see having a separate repo for config 🤔 )

jlpettersson commented 3 years ago

Would it be accurate to say another way of rewording this feature request is to make it possible to specify a volume claim template at runtime and pass it to more than one workspace?

Yes. Even today, volumeClaimTemplate is only specified at runtime - but today it can not be used in more than one workspace. This feature request makes it possible easier to use a volume from volumeClaimTemplate as a base for more than one workspace. (if this is done, the current location for volumeClaimTemplate can potentially be deprecated, and also subPath on the PipelineTask (it was added for this specific use case - https://github.com/tektoncd/pipeline/issues/2490))

Another way to reword this feature request: "Allow volume from volumeClaimTemplate to be used in more than one workspace".

I tend to agree with @sbwsg that this is addressed today by declaring a PVC separately

Yes, but this is about volumeClaimTemplate. Declareing a PVC separately is not what should be used in a production env (as described in PR https://github.com/tektoncd/pipeline/pull/2326). This feature request is 100% about volumeClaimTemplate - it is not about "existing PVCs"

is the missing piece for you that the PVC won't get automatically cleaned up at the end of the PipelineRun in that scenario?

nope, that is something unrelated.

I disagree with this specific assertion

I disagree that a PipelineRun that sometimes can deadlock (as in https://github.com/tektoncd/pipeline/pull/2546) is not considered a bug. I also mean that tasks that I declare to be parallel can not run in parallel would be a bug (as in https://github.com/tektoncd/pipeline/issues/2586). As a user I expect those problems not exists in a default configuration.

I believe by default, using the affinity assistant, we do not allow this?

Well, it was not plan to prohibit it. It is more that it does not work with it. If a Pod try to mount two volumes located in two physically different datacenter, what would we do?

@jlpettersson do you happen to have the page / section where the book recommends 2 repos? it definitely recommends putting everything in version control but I'm having trouble finding where it recommends 2 repos

The classic Continuous Delivery book is very good on making good CI, and there are very few good book available on this topic - it is a bit complex area. (I am looking forward to read your book @bobcatfish ).

The second chapter is about "Configuration Management", a few quotes:

It is often important to keep the actual configuration information specific to each of your applications testing and production envrionments in a repository separate from your source code

Configuration settings have a lifcycle completely different from that of code

There is also pictures of it. Like this one, with the repos in the top: https://www.stevesmith.tech/wp-content/uploads/2012/03/pipeline-with-metadata-1024x724.png

Also the 2nd edition of Kubernetes Up & Running has a new chapter about how to organize apps in version control.

Once your application source code and your deployment configuration files are in source control, one of the most common questions that occurs is how these repositories relate to one another. Should you use the same repository for application source code as well as configuration? This can work for small projects, but in larger projects it often makes sense to separate the source code from the configuration to provide separation of concerns.

...then how do you bridge the development of new features in source control with the deployment of those features into a production environment? This is where feature gates and guards play an important role.

I think everything in source control and the use of feature gates is very important - and since the lifecylce is so different - it is hard to implement with a single repo - without "hacks" (could e.g. check what part of commit is changed - but is hard to scale).

I'm very curious about this since personally I find keeping configuration close to the thing being deployed works best

I think this is easy to say, but hard to implement well. E.g. you don't want to run your integration tests after switching a feature-gate or URL. But this is really under-documented and many many organizations struggle with this (also reflected in why the kubernetes book added a chapter). But the Kubernetes tooling does not really help much with this, kubectl with kustomize is important but also kpt is probably needed. The book Software Engineering at Google touches on related topics, but does not explain this part. I am really interested in hearing about a well working single-repo solution.

I think the comment https://github.com/tektoncd/catalog/issues/444#issuecomment-663712885 shares my view/experience about this.

bobcatfish commented 3 years ago

As I responded to your individual points, I think there is an overall high level theme, which is that I think we have slightly different points of view on how opinionated Tekton should be.

In my opinion, Tekton should NOT be very opinionated - this leaves users of Tekton, particularly folks building CI/CD tools using Tekton, free to create opinionated tools with Tekton. Every assumption we make about how Tekton will be used will limit what can be done with Tekton.

So on the topic of PVCs, I think that Tekton should allow users to make decisions about how they want to use them, e.g. do they want to create PVCs outside of PipelineRuns and reuse them? Cool, that's up to them. Do they want to make Pipelines that use multiple PVCs which have to be created carefully in order for the resulting pods to be schedulable? That's fine too.

A think part of what @sbwsg is getting at is also that it's not clear how much of this should be Tekton's job: for example, what if someone creates a service that's all about pooling and managing PVCs? That'd be pretty neat, and I think it would be great if Tekton could benefit from that service, v.s. Tekton trying to create this. (Or Tekton Pipelines specifically - maybe some kind of PVC management service could make sense as a separate Tekton project?)

All of that being said, I'm not totally against this feature, I think it makes a lot of sense - it's just again, I don't understand why it's not reasonable to ask folks to declare a PVC separately in this case, esp. since you said the problem wasn't the automatic clean up:

is the missing piece for you that the PVC won't get automatically cleaned up at the end of the PipelineRun in that scenario?

nope, that is something unrelated.

Could you elaborate a bit more on why having the PVC declared outside of the pipeline doesn't work for this?

I disagree that a PipelineRun that sometimes can deadlock (as in #2546) is not considered a bug. I also mean that tasks that I declare to be parallel can not run in parallel would be a bug (as in #2586). As a user I expect those problems not exists in a default configuration.

I have trouble calling this a bug b/c as you call out in https://github.com/tektoncd/pipeline/pull/2546 "this depends so much of the environment e.g. type of cluster and capabilities of the storage"

As I mentioned in a comment on that PR https://github.com/tektoncd/pipeline/pull/2546#issuecomment-624105317 I think we should still support this scenario; just because there are cloud specific gotchas to making this work doesn't feel to me like it means we should discourage people from using it?

If a Pod try to mount two volumes located in two physically different datacenter, what would we do?

It feels like a lot of this comes down to what we expect a user to be able to handle and how much we want to protect them.

(I am looking forward to read your book @bobcatfish ).

haha thanks! :D

It is often important to keep the actual configuration information specific to each of your applications testing and production envrionments in a repository separate from your source code

ah i found it! i apologize if im arguing semantics but it doesnt seem to me like the book is saying 'definitely you should do this', it feels like it's saying 'you can do this'. The next sentence is "However, if you take this route, you will have to be careful to track which versions of configuration information match which versions of the application."

Anyway I think you make some great points, and I think we should support scenarios where the code and config are together in one repo, and scenarios where they are separate.

jlpettersson commented 3 years ago

I don't understand why it's not reasonable to ask folks to declare a PVC separately in this case, esp. since you said the problem wasn't the automatic clean up

Could you elaborate a bit more on why having the PVC declared outside of the pipeline doesn't work for this?

This was explained in PR https://github.com/tektoncd/pipeline/pull/2326, but I can quote it here:

There is disadvantages by reusing the same PVC for every PipelineRun:

  • You need to clean the PVC at the end of the Pipeline
  • All Tasks using the workspace will be scheduled to the node where the PV is bound
  • Concurrent PipelineRuns may interfere, an artifact or file from one PipelineRun may slip in to or be used in another PipelineRun, with very few audit tracks.

There is also disadvantages by manually creating a new PVC before each PipelineRun:

  • This can not (easily) be done declaratively
  • This is hard to do programmatically, because it is hard to know when to delete the PVC. The PipelineRun can not be set as OwnerReference since the PVC must be created first

This commit adds 'volumeClaimTemplate' as a volume source for workspaces. This has several advantages:

  • The syntax is used in k8s StatefulSet and other k8s projects so it is familiar in the kubernetes ecosystem
  • It is possible to declaratively declare that a PVC should be created for each PipelineRun, e.g. from a TriggerTemplate.
  • The user can choose storageClass (or omit to get the cluster default) to e.g. get a faster SSD volume, or to get a volume compatible with e.g. Windows.
  • The user can adapt the size to the job, e.g. use 5Gi for apps that contains machine learning models, or 1Gi for microservice apps. It can be changed on demand in a configuration that lives in the users namespace e.g. in a TriggerTemplate.
  • The size affects the storage quota that is set on the namespace and it may affect billing and cost depending on the cluster environment.
  • The PipelineRun or TaskRun with the template is created first, and is used as OwnerReference on the PVC. That means that the PVC will have the same lifecycle as the PipelineRun.

This is also explained in https://github.com/tektoncd/pipeline/issues/1986#issuecomment-605605058 and https://github.com/tektoncd/pipeline/issues/1986#issuecomment-607436779 We had this discussion before @bobcatfish in that thread :D

To put it another way, Tekton is only useful for personal projects without volumeClaimTemplate - if a team in an organization would use Tekton - volumeClaimTemplate is a must - since no more than one PipelineRun with a PVC can run concurrently without volumeClaimTemplate.

In my opinion, Tekton should NOT be very opinionated - this leaves users of Tekton, particularly folks building CI/CD tools using Tekton, free to create opinionated tools with Tekton. Every assumption we make about how Tekton will be used will limit what can be done with Tekton.

So on the topic of PVCs, I think that Tekton should allow users to make decisions about how they want to use them, e.g. do they want to create PVCs outside of PipelineRuns and reuse them? Cool, that's up to them. Do they want to make Pipelines that use multiple PVCs which have to be created carefully in order for the resulting pods to be schedulable? That's fine too.

💯

A think part of what @sbwsg is getting at is also that it's not clear how much of this should be Tekton's job

I think Tekton at least should solve fundamental real-world problems, so that pipelines can run within Kubernetes - not only for personal projects (ref volumeClaimTemplate).

for example, what if someone creates a service that's all about pooling and managing PVCs? That'd be pretty neat, and I think it would be great if Tekton could benefit from that service, v.s. Tekton trying to create this. (Or Tekton Pipelines specifically - maybe some kind of PVC management service could make sense as a separate Tekton project?)

I don't really get this one. PVs is already a kind of pooled storage (a PVC is a claim of an abstract instance), it would be pools of pools. I agree with https://github.com/tektoncd/pipeline/issues/3417#issuecomment-714694193

In general, CI/CD is getting more and more important for organizations, more and more workload is run in CI/CD - now also including "Infrastructure as Code" for cloud infrastructure. In this sense - it is becoming very important that the CI/CD is configured for "high availability" - it is no longer ok to go with a single-node Jenkins that the whole organization depend on. In this context - Tekton looks like a promising solution - but it must be possible to run in a regional Kubernetes cluster - and provide a solution for real-world problems - not be the struggling thing.

tekton-robot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale with a justification. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

jlpettersson commented 3 years ago

/remove-lifecycle stale

tekton-robot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale with a justification. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot commented 3 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten with a justification. Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

bobcatfish commented 2 years ago

After a brief discussion with pipeline owners today we're generally in favor of this feature so:

/remove-lifecycle rotten /lifecycle frozen

lbernick commented 11 months ago

@jlpettersson do you think the new "coscheduling" feature that allows a workspace to be bound to multiple taskruns (i.e. per-pipelinerun affinity assistant) would address this use case? More details here; looking for feedback on https://github.com/tektoncd/pipeline/issues/6990.