Update closest_location.py

usegalaxy-eu / tpv-metascheduler-api

Metascheduler for TPV as Service

MIT License

0 stars 5 forks source link

Update closest_location.py #7

Closed abdulrahmanazab closed 7 months ago

abdulrahmanazab commented 7 months ago

Adding the queue size logic

sebastian-luna-valero commented 7 months ago

Thanks @abdulrahmanazab !

I would be grateful if @sanjaysrikakulam and @pauldg could have a look.

abdulrahmanazab commented 7 months ago

This is a very simple logic "no over engineering" where the queue size and the distance are given the same weight. This can be sufficient for the milestistone. Other considerations:

In case a destination X keeps jobs waiting so long in the queue, either because it has a very few compute resources, or the types of jobs that are sent there are normally long running ones, etc. You should down prioritise this destination (if the queue size is large) even if it is next to the object store.
To tackle this: calculate the average/median waiting time of jobs (from submission to executing) which are sent to this destination.
Simplest way to calculate this:
For each job that change status from "queued" to "executing": waiting_time = EXECUTE_TIME - SUBMIT_TIME destination['avg_waiting_time'] = (destination['avg_waiting_time'] + waiting_time)/2

maikenp commented 7 months ago

Just a dumb question most probably - but was not the point to use the logic of the metascheduler algorithms you tested and showed at the Galaxy Days last year @abdulrahmanazab - and in this case just feed in two parameters instead of the many it uses?

sebastian-luna-valero commented 7 months ago

Also, would it be possible to add test data to check input and desired output?

abdulrahmanazab commented 7 months ago

Just a dumb question most probably - but was not the point to use the logic of the metascheduler algorithms you tested and showed at the Galaxy Days last year @abdulrahmanazab - and in this case just feed in two parameters instead of the many it uses?

Well, I'd say that this is an "initial" step on the way. With the data that we have, i.e. the two configuration parameters (location and queue size), this is a very simple algorithm. My algorithms are dependent on historical info of workloads and destinations. I agreed with @pauldg to start with the information that is already stored in Galaxy so that we don't need to establish yet another layer to collect additional information from the destinations. For this, there will be a simple database to "log" the historical data of Galaxy jobs and destinations, which will be used in the new algorithms.

But for this milestone, we go with the simple algorithm, then the other ones can be reported with the deliverable

pauldg commented 7 months ago

Also, would it be possible to add test data to check input and desired output?

Yes, I can add some example data to the api docs.

FYI, let's keep further discussion in specific issues, since this PR is now merged