radical-cybertools / radical.pilot

RADICAL-Pilot
http://radical-cybertools.github.io/radical-pilot/index.html
Other
54 stars 23 forks source link

Priority for scheduling the tasks to available resources #3213

Closed okilic1 closed 3 weeks ago

okilic1 commented 3 months ago

As part of the AdaptiveExecution work, we need to submit a bag (oversubscribed) of low priority tasks while we are also submitting a higher priority task. When a resource available we want first the higher priority task to schedule and if there are not any higher priority task is available we can schedule the low priority one.

Example: Within ddmd3 workflow. We have DFT calculations that has been used for Force Field Training (FFT). FFT uses the output of the DFT as input and having more data to train improves the FFT. We can run infinite number of DFT but we will always only run 4 FFT tasks at a time. Assuming we have 1 nodes with 4 GPU and 48 CPU. DFT uses only 1 CPU and FFT uses 1CPU+ 1GPU. 4 FFT tasks only start after enough DFT calculations run. So when we will submit FFT there will be still DFTs in queue but we prefer FFT to start as soon as there is enough resource. But instead of killing every DFT we want to keep running new ones on the remaining of 44 CPUs to create more data to create.

andre-merzky commented 3 months ago

RP does not guarantee order of tasks - so what is expected to happen if the low priority tasks arrive first and get scheduled on the free resources, and the high priority tasks arrive after?

andre-merzky commented 3 months ago

Maybe introduce a killable flag? Also, on high priority tasks, maybe add a policy kill-for-resources / wait-for-resources.