Closed mborsz closed 2 years ago
/area prow
/cc @chaodaiG @alvaroaleman
Chao, Alvaro, what do you think on this feature request? Is it feasible?
Would like to understand the requirements better:
While for most CI jobs we use periodic ProwJobs which is fine, for manual runs we use spreadsheets to book gcp projects, which is cumbersome.
When you mentioned periodic prowjobs, does it mean the periodic prowjobs that execute all A
, B
, and C
in a single prowjob?
Not sure about manual runs, how would manual runs related to prow?
Sorry, maybe I wasn't clear on our use case.
Our team (sig scalability/GKE scalability) maintain tens of periodic prowjobs (for sig scalability: https://github.com/kubernetes/test-infra/tree/master/config/jobs/kubernetes/sig-scalability).
The workflow we are using for testing new kubernetes changes is:
This way we can easily do A/B testing (compare test results with our change and baseline from CI job).
This is what I meant by manual run. This is a common way we use to test either new kubernetes changes or infrastructure changes or test changes. The problem we are trying to solve is a project management -- we have only few project with capacity enough to run 10k core jobs and with multiple people running such manual jobs, manual project management is getting hard.
I don't want to focus on periodic jobs, as project management for them is currently solved by preparing a right cron schedule.
Hypothetically we could do something with labels (max concurrency for jobs with label foo). Would that solve your issues?
Disclaimer, I won't be implementing that (as I don't have the problem) and it would need acking from the other maintainers in terms of if we are ok with the added complexity.
Hypothetically we could do something with labels (max concurrency for jobs with label foo). Would that solve your issues?
Do you mean adding labels like:
labels:
prow.k8s.io/queue: foo
prow.k8s.io/queue-max-concurrency: 5
?
If so, I think it solves our problem, but seems to be slightly more complex than the solution I proposed in first comment (where I simply reuse the existing max concurrency mechanism). Is there any particular reason why using labels is better than adding a new field to ProwJob's spec?
Disclaimer, I won't be implementing that (as I don't have the problem) and it would need acking from the other maintainers in terms of if we are ok with the added complexity.
We can make necessary code changes, once we get approval from maintainers for the approach we take.
Thank you @mborsz , now I understand it better. Please correct me if I'm still getting it wrong:
kubectl create -f prepared-configuration.yaml
, I assume this is a prowjob CR, otherwise max_concurrency
would not work. Is that correct? (If the answer is yet, then the manual tester need to have admin power on prow cluster)If so, I think it solves our problem, but seems to be slightly more complex than the solution I proposed in first comment (where I simply reuse the existing max concurrency mechanism).
Two reasons:
scaletesting
)@chaodaiG you are right, prepared-configuration.yaml is a prowjob object (like e.g. https://prow.k8s.io/rerun?prowjob=40e18bcd-a77a-11eb-a9fd-4a69d306464d). You are right, testers do have admin power on prow cluster.
If you can create prowjobs anyways, why not just write a custom controller that knows everything it needs to knows and creates prowjobs? We do that in a couple of places in Openshift for things that can not reasonably be re-used upstream, because they are very downstream-specific. That would give you more flexibility and not require to get an ack from us for every change.
The integration with tekton pipeline was supposed to solve this problem. Would a ProwJob that leverages that system be acceptable here?
testers do have admin power on prow cluster.
This worries me a little bit, didn't realize before that anyone other than oncall has admin control on prow service cluster, which doesn't seem to be very secure. + @cjwagner here for awareness or opinions
testers do have admin power on prow cluster.
This worries me a little bit
Mentioned Prow cluster is not part of k8s.io infrastructure.
On a side node: I believe the proposed solution solves your problem, and I think it would be a good change. We can discuss further on whether a new field or a label
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale
/remove-lifecycle stale
/sig testing
/sig scalability
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
I would still like to do this eventually.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove-lifecycle rotten
I'd like to add this feature if we reach a consensus on how it should be implemented.
What would you like to be added: I would like to be able to queue ProwJobs run to run one after another, e.g. I create 3 ProwJobs:
I would like to B start when A finishes and then C.
Why is this needed:
As scalability team, we manage a lot of long-running jobs that share the same gcp-projects (or boskos pools). We want to be able to run them effectively and in an easy way.
While for most CI jobs we use periodic ProwJobs which is fine, for manual runs we use spreadsheets to book gcp projects, which is cumbersome.
I would like to be able to create a queue for each gcp-project (boskos pool) and say that only up to N jobs can run concurrently.
Currently I'm aware of two ways for achieving this:
Suggested solution Basically extending maxConcurrency to use a custom queue key, i.e. I propose the following changes:
This way, we can easily set queueName field for jobs that we want to create queue, set maxConcurrency on them and they will wait for each other, as expected. Additionally this approach can work both for jobs using single gcp-project (by setting maxConcurrency=1) and using boskos (by setting maxConcurrency=pool size).
Technically this change seems to be pretty straightforward: it requires changing canExecuteConcurrently and other places around, but still very simple change.