Right now we have multiple clusters, and we are scheduling jobs to all of them in round-robin fashion. But we do not check if cluster is overloaded. Sometimes we keep scheduling jobs to cluster that is overloaded and have problems with scheduling jobs, as they keep piling up on this cluster.
We need to implement some kind of load balancing, that will check if cluster is overloaded (pending jobs > 2) and do not schedule to it.
I suggest following, first we go over clusters, one by one, and check if it is overloaded. If it is not, we schedule job to it. If it is, we go to next cluster.
If we reach end of list, and all clusters are overloaded, we schedule to one that had least pending jobs.
Right now we have multiple clusters, and we are scheduling jobs to all of them in round-robin fashion. But we do not check if cluster is overloaded. Sometimes we keep scheduling jobs to cluster that is overloaded and have problems with scheduling jobs, as they keep piling up on this cluster. We need to implement some kind of load balancing, that will check if cluster is overloaded (pending jobs > 2) and do not schedule to it. I suggest following, first we go over clusters, one by one, and check if it is overloaded. If it is not, we schedule job to it. If it is, we go to next cluster. If we reach end of list, and all clusters are overloaded, we schedule to one that had least pending jobs.