wlandau / crew.cluster

crew launcher plugins for traditional high-performance computing clusters
https://wlandau.github.io/crew.cluster
Other
27 stars 9 forks source link

Spread the load acros two mirrored remote SLURM systems #30

Closed stemangiola closed 1 year ago

stemangiola commented 1 year ago

Prework

Proposal

I am not sure if this is already possible, but let's say I have two remote SLURM clusters that have mirrored file systems (same directories and file locations). Would it be possible to spread the pipeline load across those two URLs?

Thanks a lot.

wlandau commented 1 year ago

You could set up a controller for each cluster, then combine them in a controller group: https://wlandau.github.io/crew/articles/groups.html. You will still need to manually decide which controller to send each task, but the interface is a bit more seamless. If you want to make a decision on the fly, you can use the saturated() method to check if a controller's workers are all busy: https://wlandau.github.io/crew/reference/crew_class_controller_group.html#method-crew_class_controller_group-saturated

wlandau commented 1 year ago

Converting this issue to a discussion.