Open vinaydhegde opened 1 year ago
This is a really nice feature, is there anything I can do together?
Volcano currently does not support the expansion and contraction of task replicas through Scale subresource
.
There are multiple tasks in a single Volcano job, and the specReplicasPath
in the crd cannot be used to distinguish which task’s replicas should be expanded. In addition, the status of the task is a map structure, and the key value is the name of the task. statusReplicasPath
cannot be configured in the crd.
For job-level replicas expansion and contraction, the volcano job does not have the replicas attribute, but has minAvailble data, which indicates the minimum number of replicas that need to be met in the job. This supports the Scale subresource
function, and the following configurations can be performed:
subresources:
scale:
specReplicasPath: .spec.minAvailable
statusReplicasPath: .status.running
After verification by kubectl scale --replicas=8 vcjob/job-xxx
, the number of minAvailable can be modified, but it seems meaningless to expand and shrink minAvailble in practical applications.
The Volcano job itself has the ability to provide elastic expansion and contraction of the replica, and combines the replicas and minAvailble of the task and the minAvailble of the job to achieve the elasticity of the replica. When the job and task meet the minAvailble resource requirements, the job can run. If there are more resources in the cluster, resources will continue to be allocated to the job until the number of replicas is met.
By the way, in what scenario is the elastic expansion and contraction of the number of copies in the task mainly used? Could you please share it in detail?
Hello 👋 Looks like there was no activity on this issue for last 90 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).
Closing for now as there was no activity for last 60 days after marked as stale, let us know if you need this to be reopened! 🤗
/reopen
@lowang-bh: Reopened this issue.
Currently, can volcano job dynamically increase or decrease the number of task Pods based on requested resources and cluster free resources?
What would you like to be added:
I would like to add scaling feature to the CRD jobs.batch.volcano.sh 'kubectl scale --replicas= jobs.batch.volcano.sh ' should scale the replicas of worker task
Why is this needed:
This is needed to scale the jobs (scale up/down the number of PODs) based on CPU load.
What I tried so far
I added a subresources block to the CRD jobs.batch.volcano.sh, but when I run 'kubectl scale' command doesn't do anything (it says resource is scaled, but replicas are not getting updated). Below is the code block I added to the CRD jobs.batch.volcano.sh (kubectl edit customresourcedefinition jobs.batch.volcano.sh). subresources: scale: specReplicasPath: .spec.tasks[1].replicas statusReplicasPath: .status.tasks[1].replicas status: {}
Status: block of this CRD doesn't have a replicas: field. Since we cannot keep the statusReplicasPath: field empty, I have set the value to .status.tasks[1].replicas
FYI: I referred Kubernetes Doc to try this scaling option