support preemptable inference jobs

microsoft / DLWorkspace

Deep Learning Workspace

Other

201 stars 75 forks source link

Closed leigaoms closed 4 years ago

leigaoms commented 4 years ago

RestAPI:
- SubmitJob(): parse "mingpu", "maxgpu" in jobParams, and also be compatible with "resourcegpu", "gpulimit".
- ScaleJob(): change "resourcegpu" to "mingpu" and "maxgpu". No compatibility issue.
JobManager:
- Add "job_preemptable_resource", "allowed_resource" in job_info. If inference job is scheduling/running, subtract “mingpu” related resource, and keep the job for preempt gpu allocation later.
- Scheduling logic: 1) Mark non-preempt training job: job status is "queued". 2) Mark inference job non-preempt part: job status is "queued". Allocate if all "mingpu" related resource could be satisfied. 3) Mark preempt training job: job status is "queued/scheduling/running". 4) Mark inference job preempt part: job status is "queued/scheduling/running". Allocate partial resource if not all "maxgpu" related resource could be satisfied. Suppose CPU/Memory is more than GPU, so allocate GPU first, CPU and memory is in proportion with allocated GPU.

Refinement:

fair-sharing: In 4), share remaining GPU among different inference jobs in proportion or on average.
Assumption does not apply for CPU cluster, CPU inference job

coveralls commented 4 years ago

xudifsd commented 4 years ago

Have you tested these, or is behaviour will like these:

inference job being preempted by training jobs, training jobs squeeze resource used by inference to min_gpu and no more regular job can be scheduled(I'm not sure what will happen if at this time inference job owner increase min_gpu)
training job can preempt other training jobs with preemption enabled but not inference job with min_gpu allocated
inference job can be allocate more if resource is available
inference job can not allocate more if other training jobs are also waiting resource

leigaoms commented 4 years ago

Have you tested these, or is behaviour will like these:

inference job being preempted by training jobs, training jobs squeeze resource used by inference to min_gpu and no more regular job can be scheduled(I'm not sure what will happen if at this time inference job owner increase min_gpu) Lei: Good point! I have added a constraint: User cannot change min_gpu if job is scheduling/running. If they need to change min_gpu, pause the job first to release all GPUs, then resume it. For already scheduling/running inference job, min_gpu cannot be preempted.
training job can preempt other training jobs with preemption enabled but not inference job with min_gpu allocated Lei: Yes
inference job can be allocate more if resource is available Lei: Yes, up to min(max_gpu, remaining cluster gpu)
inference job can not allocate more if other training jobs are also waiting resource Lei: Yes. training job is scheduled ahead of inference job