Open UmmulkiramR opened 10 months ago
How will the scheduler prioritize which directory to choose? Will it try to completely use up the resources of one directory before moving to the next, or will it try to spread the load evenly? How will it decide between two directories that both have capacity? Should we be able to set a priority order to use up directories or is there some other configurations needed for this strategy?
Also, our workflow requests right now contain information on the cluster and/or directory we want them to run in. After this feature is implemented how will the workflow requests change? Can we still specify the cluster we want it to run in? Do we need to include/exclude specific settings to get the scheduler to decide where to run the workflow?
Lastly, how will the RDPC admin be able to see which cluster a workflow is running in? Is this information visible in the workflow UI and/or workflow API (through the RDPC Gateway)? Do we currently have a view of how many workflows are running in each cluster or is that another task that we should investigate in the future?
Instead of combining directory path and cluster name into a single string, is there a way to split these into separate lists in the config:
directories:
- cluster1:
- "nfs-local/nfs-1"
- "nfs-local/nfs-2"
- cluster2:
- "nfs-external/nfs-1"
- "nfs-external/nfs-2"
How does Scheduler calculate available capacity: The Scheduler allocates the work directory to a new workflow based on a cost calculation. Every workflow has a cost and every directory has a max cost. For example: If the max cost per directory is set to 4 (MaxCostPerDir = 4)and the cost of one Dna Seq workflow = 2, that would mean that one directory can accommodate 2 running Dna Seq jobs.
How to get Scheduler to auto-trigger jobs in external cluster:
The list of directories maintained in the Scheduler currently looks like this
We can improvise this configuration to enable auto-triggering of the workflows to an external cluster by adding the list of all directories present in the external cluster along with the cluster information for each directory. Note here the cluster info is separated by a colon. 'default' means the local cluster and cluster2 is the external cluster.
Now, at decision time Scheduler will check for the available capacity (using the cost calculation) for all the directories in the list above and allocate them to the workflows. Whichever directory the Scheduler picks it will add its corresponding cluster info as a parameter to the list of workflow params to signal management about the cluster it needs to fire the jobs in.
This will eliminate the need for users to provide a cluster name to fire the jobs in and instead leave this for the Scheduler to decide. However, users can still override this behaviour and provide a cluster in which to run the workflows (in the workflow params) but they will need to provide the directory names too.