Closed ghost closed 4 years ago
That's a interesting idea. The default could be true
which is a similar experience from other CI systems.
Could use the configuration we use currently for pipelineresources (or similar): https://github.com/tektoncd/pipeline/blob/master/docs/install.md#how-are-resources-shared-between-tasks
Discussed in beta working group and it doesn't feel like we need this for beta at least, so we want to wait until a user indicates they definitely need this functionality.
My 2-cents... Any feature that assumes the ability to automatically create a PVC (PVs, attaching volumes, or anything to do with underlying storage resources of any kind) is a no-can-do for us. Any resource like that must be pre-allocated and pre-created and its reference injected into a PipelineRun/TaskRun.
This is why Tekton pipelines simply wasn't usable for us here at eBay until the Workspaces was implemented. As such, I see no need for such feature as our underlying infrastructure would be unable to support it in the first place.
My 2-cents... Any feature that assumes the ability to automatically create a PVC (PVs, attaching volumes, or anything to do with underlying storage resources of any kind) is a no-can-do for us. Any resource like that must be pre-allocated and pre-created and its reference injected into a PipelineRun/TaskRun.
It could be a configuration switch (that is on or off by default)
In https://github.com/tektoncd/pipeline/issues/2174 @tstmrk brought up the example of volumeClaimTemplates, which are a feature of Stateful Sets. They provide a similar kind of system to what we're discussing here I think, although they don't clean up the volumes after the Pod is destroyed. In our case we might want to do that when a Task or PIpeline have completed.
volume claim templates were mentioned again in #2218 as a possible implementation for this feature. Useful to know: they are used in Argo Workflows for a similar use case. Quoting @cameronbraid :
I would like to be able to have workspace PVC's creation and deletion fully managed by tekton using a volume claim template.
Currently this is supported in Argo Workflow and is a very useful feature (https://github.com/argoproj/argo/blob/master/examples/README.md)
volumeClaimTemplates: # define volume, same syntax as k8s Pod spec - metadata: name: workdir # name of volume claim spec: accessModes: [ "ReadWriteOnce" ] resources: requests: storage: 1Gi # Gi => 1024 * 1024 * 1024
The PVC would be created when the pipeline is executed, and deleted afterwards
This would conceptually be comparable to an emptyDir volume, however with the ability to specify a storage class for the PVC and not relying on a folder on the host as emptyDir does.
The current drawback of the workspace PersistentVolumeClaim is that issues would arise if two pipelines run concurrently that use the same PVC
@sbwsg +1 for claim templates. It would be useful to be able to configure a default claim template for the cluster as well, in addition to per pipeline.
For others currently dynamically creating PVCs as the results of triggers firing so each PipelineRun gets a unique one.... can you please share how you are doing this now?
Currently if a pipelinerun doesn't provide any workspace bindings, the pipelinerun fails immediately. Would it make sense for the controller to bind those to emptyDir
if no binding is provided, so that most pipelines can execute nonetheless?
This is in addition to the other paths discussed like claim templates, etc .
Yeah that sounds useful! Behaviour of the fallback probably configurable somehow?
If I have pipeline for a project. I expect that PipelineRun
s can run concurrently and independently without interfering.
Use case for a CI-pipeline:
PipelineRun pr-master-xyz
is building the last merge to master
PipelineRun pr-user-a-abc
is building a pull-request from User A
PipelineRun pr-user-b-efg
is building a pull-request from User B
I also expect that the client starting those PipelineRun not to be too smart.
Currently, the client starting those PipelineRun must know some things, e.g. creating PersistentVolumeClaim
s with unique names for each PipelineRun it want to initiate.
Unless, this scenario is happening:
Example Pipeline
kind: Pipeline
metadata:
name: sequencial
spec:
tasks:
- name: sleepy
taskSpec:
steps:
- name: write
image: ubuntu
script: echo $(workspaces.task-ws.volume) > $(workspaces.task-ws.path)/foo
- name: sleep
image: ubuntu
script: sleep 120
workspaces:
- name: task-ws
workspaces:
- name: task-ws
workspace: pipeline-ws
workspaces:
- name: pipeline-ws
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pipeline-pvc
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 1Gi
and a number of PipelineRun pr1, pr2 ... pr6
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
name: pr1
spec:
pipelineRef:
name: sequencial
workspaces:
- name: pipeline-ws
persistentVolumeClaim:
claimName: pipeline-pvc
Starting all those PipelineRun concurrently with kubectl apply -f <directory>/
I expected (with current design) those PipelineRun be started at the same time, but executed sequencially one after the other (since they all use the same PVC). But someting (for me) more unexpected behavior is happening.
$ tkn pipelinerun list
NAME STARTED DURATION STATUS
pr1 28 seconds ago --- Running
pr2 28 seconds ago --- Running
pr3 28 seconds ago --- Running
pr4 28 seconds ago --- Running
pr5 28 seconds ago --- Running
pr6 28 seconds ago --- Running
Some of the pods is Running, but some are stuck in Init-state. Update: This was because I had limited resources in my cluster.
$ kubectl get po
NAME READY STATUS RESTARTS AGE
pr1-sleepy-n5msg-pod-mdmsg 0/2 Init:0/2 0 46s
pr2-sleepy-rmmk7-pod-4w5xk 1/2 Running 0 46s
pr3-sleepy-wtrs2-pod-ld2dw 1/2 Running 0 46s
pr4-sleepy-2dttb-pod-8cxw7 1/2 Running 0 45s
pr5-sleepy-rx2hx-pod-jd9rr 0/2 Init:0/2 0 44s
pr6-sleepy-w7d5f-pod-mmw9f 0/2 Init:0/2 0 43s
I can see that the PVC is mounted by all pods
$ kubectl describe pvc pipeline-pvc
...
Mounted By: pr1-sleepy-n5msg-pod-mdmsg
pr2-sleepy-rmmk7-pod-4w5xk
pr3-sleepy-wtrs2-pod-ld2dw
pr4-sleepy-2dttb-pod-8cxw7
pr5-sleepy-rx2hx-pod-jd9rr
pr6-sleepy-w7d5f-pod-mmw9f
I think this is related to https://github.com/kubernetes/kubernetes/issues/60903#issuecomment-377074209
Add volumeClaimTemplates:
to PipelineRun
, similar to StatefulSet
. From the StatefulSet doc:
// volumeClaimTemplates is a list of claims that pods are allowed to reference. // The StatefulSet controller is responsible for mapping network identities to // claims in a way that maintains the identity of a pod.
PipelineRun
could have this in manifest (from StatefulSet example in k8s doc), possibly for each workspace in "workspaceBinding":
# in a pipelinerun
spec:
workspaces:
- name: code-workspace
volumeClaimTemplate:
metadata:
name: shared
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi
And then PipelineRun could get unique by appending the generateName, e.g. shared-code-workspace-pr-xyzab
similar to how pods get its name from TaskRun.
Types: PipelineRunSpec
and TaskRunSpec
both use WorkspaceBinding
. But I expect TaskRun
stay as-is since all Tasks for a PipelineRun should use the same PVC?
Hey @jlpettersson ! Thanks for the detailed problem statement and proposal! I'm wondering, if the PVCs were created for you by the controller, and/or the controller managed a pool of PVCs such that the PipelineRuns just needed to indicate that they needed a PVC and the controller took care of the rest, would that work for you?
I think ~we are~ I am leaning toward a model in the long run where we want the folks operating the Tekton installation to be in control of details around PVC creation / management (or any other underlying storage) and folks/tools creating Runs don't need to worry about it. What do you think?
Basically I'm wondering if you'd rather configure this kind of thing:
@bobcatfish I see your point. As I see it, PV resources belong to the folks operating the Tekton installation. But PVC is a resource that is created in the users namespace, and by declaring the storageClassName
you get the kind of storage you want, dynamically and the same pipeline run works across cloud providers. https://kubernetes.io/docs/concepts/storage/dynamic-provisioning/
Persistent Storage design document writes about PV and PVC's responsibilities. https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/persistent-storage.md
The PVCs created in a namespace also belongs to the quota to the user: https://kubernetes.io/docs/concepts/policy/resource-quotas/#storage-resource-quota so I think it would be great if the user could control size and type, and there is also a default storageClass.
In a bigger picture, I think PipelineRun
or TaskRun
is declared once, in the TriggerTemplate
. I feel that this fits better with the declarative way in the k8s ecosystem.
There is also this issue https://github.com/tektoncd/triggers/issues/476 but I feel that the PVC belongs to the PipelineRun
and TaskRun
, because the PipelineRun controller should also control the lifecycle for the PVC, e.g. removing when the PipelineRun is done.
so I think it would be great if the user could control size and type, and there is also a default storageClass.
To make sure I'm totally clear @jlpettersson you'd like the user creating the PipelineRun to be able to control this? For example the way that PipelineResources create PVCs, you can configure the storage class across the installation: https://github.com/tektoncd/pipeline/blob/master/docs/install.md#configuring-a-persistent-volume
because the PipelineRun controller should also control the lifecycle for the PVC, e.g. removing when the PipelineRun is done.
Interesting point! Would you want the PipelineRun controller to always delete the PVC when the PipelineRun finishes, or would you want to be able to configure that?
@bobcatfish yes, that is correct.
Managing one PVC per PipelineRun can only be done programmatically currently, and it is very hard to get correct from the outside (hard to know when to delete). This is the biggest obstacle for me implementing Tekton for a larger enterprise at the moment.
I want to declaratively do this (not using any params here, for clarity), e.g.
apiVersion: triggers.tekton.dev/v1alpha1
kind: TriggerTemplate
metadata:
name: pipeline-template
spec:
resourcetemplates:
- apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
generateName: simple-pipeline-run-
spec:
pipelineRef:
name: simple-pipeline
workspaces:
- name: ws
volumeClaimTemplate:
metadata:
name: ws-pvc
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 1Gi
Then I can have my declarative Tekton manifest files stored in e.g. git and practice GitOps as with most Kubernetes resources.
The strenght with using volumeClaimTemplate
is that is the same as for StatefulSet. Also ArgoCD Workflow is using it. So it is familiar syntax for users of the kubernetes ecosystem.
Consider a cluster with namespaces:
The machine learning team may use huge machine learning models in their apps, so they need a workspace volumes, e.g. 10Gi. While the Windows team need custom storageClass. And the java team can use the default storageClass, but they can use smaller volumes, e.g. 1Gi.
Preferably I want that the by PipelineRun created PVC (from user provided template) should be deleted when the PipelineRun is successfully completed. Then you are only paying for it in short time and your namespace Storage Quota is not filled up. Meanwhile we can avoid deleting PVCs for failed PipelineRuns - for investigation. Yes, this can be configurable.
E.g. namespace hackathon-playground get it's namespace storage quota filled up fast since their flaky tests makes many PipelineRuns to fail. But since there is a quota they will not allocate Terabytes of storage - they have to clean it up and manage the resources - storage is kept within budget.
The alternative to all above, offered with Tekton currently is to reuse the same workspace PVC. This has many disadvantages. First, you need to add an extra Step to cleanup the PVC at the end of the PipelineRun. But for concurrent builds, e.g.
PipelineRun pr-master-xyz
is building the last merge to master
PipelineRun pr-user-a-abc
is building a pull-request from User A
PipelineRun pr-user-b-efg
is building a pull-request from User B
The problem here is that all PipelineRuns use the same PVC, and they may interfere. In a Pipeline Compile to binary --> Run Unit Tests --> Build Image
. The binary for pr-user-a-abc
may slip in to "Build Image" for pr-master-xyz
, so code not intended for production may be released. Also an unprivileged user from another team, can send an unapproved Pull Request with CI-build, skip the unit-tests and get his "planted" binary into an on-going release-build, a security issue, and this is almost without audit logs about what happened since it was a shared volume and timing that was the cause.
I love the Tekton project so far, it looks very promising and we try to adopt it. But this is almost a showstopper. But I would love to contribute and implement this, currently investigating it since I may need to use a branch for it anyway.
@bobcatfish @sbwsg I went on and submitted an implementation similar to what I proposed above. It is possible to try it out.
The implementation is inspired by StatefulSet. Nothing is removed, this is purely an additional volume source option when using workspaces.
For lifecycle management, I set the OwnerReference
on the PVC pointing to the PipelineRun
that provided the template. When deleting a PipelineRun
, as before, the created TaskRuns is deleted, but now - if a PVC was created from a template - that will also be deleted.
OK, I'm going to split this issue into two. I'll leave this one for discussion of the volumeClaimTemplate proposal and will cut a new one for the tekton artifact configmap.
@jlpettersson thanks for taking this on! Would be open to bringing this PR up in our weekly working group on Wednesday? It would be good to share this with the group, perhaps demo it, and get any feedback on the design as implemented. If you're unable to make it then would you mind if I presented it to the WG?
Edit: Here're the details of the Working Group. This week's call will be 12-1 ET.
@sbwsg yes, I can have a demo on Wednesday :)
Expected Behavior
This issue started out as a general discussion on a mechanism for dynamically allocating storage for TaskRuns at runtime but has become focused on volume claim templates. So now this issue is for discussion of Volume Claim Templates to define PVCs in workspaces at runtime, similarly to the way StatefulSets and Argo handle storage (see comments in this PR for links to each of these).
See this issue about the artifact configmap for discussion of Workspaces supporting the config-artifact-pvc approach that we use with PipelineResources and to revisit the conversation around "Auto Workspaces".
Actual Behavior
Users have to explicitly define the PersistentVolumeClaim configuration every time they bind a workspace that spans multiple tasks.