Open VeereshAradhya opened 4 years ago
/cc @jlpettersson @sbwsg
I'm not sure that I agree that a PipelineRun should fail if a PVC doesn't exist the moment a Pod runs. Isn't it possible that the volume is being provisioned by some other service and will appear some time after the Pod launches? Or will a PVC always exist prior to a Pod launching?
Yes, this is the way Kubernetes works. It is hard to say if we should do it differently.
Kubernetes is an eventual consistency system. E.g. you may create a PVC and a Pod/TaskRun in the same "kubectl create"-command. But they are created by different controllers that solve different responsibilities - independent of eachother.
If the taskrun-controller should be responsible for this, it need to lookup PVCs and there might be a race from the same "kubectl create"-command, so a missing PVC might be created just a few milliseconds after the taskrun-controller does the PVC-lookup.
I agree that UX might be better to show this earlier. But that is contrary to the loosely-coupled Kubernetes architecture with eventual consistency that make it very scalable and with clear bounded responsibilities for controllers.
@jlpettersson @sbwsg maybe we should at least update the Reason
message to "say" that we are waiting for a PVC to be provisionned.
I agree that this is Kubernetes behavior and I'd be cautious about changing it. This is just one manifestation of a non-existent resource. You can get the same behavior for example when referencing a configmap that doesn't yet exist.
I did look into improving the display status for issue #2268 but didn't get a lot of feedback. Part of the problem is that if there are multiple pipeline branches active there can be multiple things going "wrong". It's hard to concisely summarize what's going on.
@jlpettersson @sbwsg maybe we should at least update the Reason message to "say" that we are waiting for a PVC to be provisionned.
This is an interesting idea. I wonder if we can detect from Pod status / events whether it is waiting on a PVC.
You can get the same behavior for example when referencing a configmap that doesn't yet exist.
Ah, good point, I hadn't considered the non-PVC scenarios too.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
Send feedback to tektoncd/plumbing.
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
/lifecycle rotten
Send feedback to tektoncd/plumbing.
/remove-lifecycle rotten
These seems similar to https://github.com/tektoncd/pipeline/issues/3563 in that it's a question of whether or not we should add PVC specific behavior that differs from the k8s paradigm.
Since this issue went rotten and the discussion was pointing out that this is the standard k8s behavior (and the same question applies to other types like configmaps as @GregDritschler pointed out), I'm inclined to close this issue.
These seems similar to #3563 in that it's a question of whether or not we should add PVC specific behavior that differs from the k8s paradigm.
I am not sure I fully agree on this. This issue is about what should we do in case of PVC not being ready at the time we try to create TaskRun/PipelineRun. Right now, we say we are pending because it might be available at some point in the future. But we never reconcile again (aka if it is available), so we always end-up timing out (after 1h or more).
From a user perspective, I would either assume we are waiting because it will be reconciled at some point or it should fail directly. I think we need to decide on this.
But we never reconcile again (aka if it is available), so we always end-up timing out (after 1h or more).
OH. That sounds like a bug I 100% agree. I thought we've only been discussing the messages when a PVC isn't available; i definitely did not understand that this was also an issue describing a bug where the reconcile doesn't occur again!
From a user perspective, I would either assume we are waiting because it will be reconciled at some point or it should fail directly. I think we need to decide on this.
I don't think we should fail directly, as this is asynchronous (e.g. how Kubernetes works). This problem is very similar to https://github.com/tektoncd/pipeline/issues/3378
Ah kk thanks for explaining that extra detail @vdemeester I agree this is a bug.
I don't think we should fail directly, as this is asynchronous (e.g. how Kubernetes works).
I agree, I think we might want to subscribe to updates on the PVC if we can - does anyone know which part of the tekton code requires the PVC to exist?
But we never reconcile again (aka if it is available), so we always end-up timing out (after 1h or more).
OH. That sounds like a bug I 100% agree. I thought we've only been discussing the messages when a PVC isn't available; i definitely did not understand that this was also an issue describing a bug where the reconcile doesn't occur again!
This is news to me as well. I just tried it again and when the missing PVC is created and bound to a PV, the pod runs and the TaskRun completes exactly as one would expect.
It is true that if the PVC is never created that the pipelinerun will timeout. Again that is what one would expect.
What is the scenario where the pipelinerun does not continue when the PVC is created and bound? I have not seen that reported before.
does anyone know which part of the tekton code requires the PVC to exist?
a TaskRun could hold-on to create a Pod until the PVC exists, perhaps? When creating a Pod from a TaskRun we know if a PVC is passed or not. But isn't this preventing concurrent work? E.g. a node could pull the image concurrently as it waits for a PVC to be created?
We could perhaps have a shorter timeout if the Pod never start to run? 1h is a bit long for that case?
What is the scenario where the pipelinerun does not continue when the PVC is created and bound? I have not seen that reported before.
Yes, there are many scenarios here, if we should handle them all. If we watch PVCs or so, Tekton can do more.... e.g. if the PV is located in a different AZ the Pod cannot be scheduled either - and there are more reasons to not be able to schedule a Pod.
I'm a bit confused. I feel like there are two different conversations happening at once here:
There is apparently a bug in Tekton where a reconcile does not happen even when a PVC is created and bound successfully after the TaskRun is created. I cannot reproduce this using a simple TaskRun + pvc combo in Minikube. Definitely need more input here on specifically how to repro @vdemeester .
There is a broader question about how a TaskRun should report about themselves waiting on PVCs.
I am much more immediately interested in fixing the possible bug described in (1) (from comment) than the messaging part of this. That sounds like a serious bug that needs to be fixed.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen
with a justification.
/lifecycle stale
Send feedback to tektoncd/plumbing.
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen
with a justification.
/lifecycle rotten
Send feedback to tektoncd/plumbing.
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
with a justification.
Mark the issue as fresh with /remove-lifecycle rotten
with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen
with a justification.
/close
Send feedback to tektoncd/plumbing.
@tekton-robot: Closing this issue.
Re-opening this issue as I am not sure this has been fixed.
Expected Behavior
The piplinerun should check if given pvc is existing or not and pipelinerun should fail if the pvc is not present
Actual Behavior
The pipelinerun will be in
Running
state and taskruns will be inRunning(Pending)
stateSteps to Reproduce the Problem
tkn
commands to run the pipelineAdditional Info
Kubernetes version:
Output of
kubectl version
:Tekton Pipeline version:
Command logs
Running using
kubectl
task:
pipeline:
pipelinerun: