retz / retz

A batch job queuing and execution service (Mesos framework)
http://retz.github.io/
Apache License 2.0
24 stars 11 forks source link

only oldest jobs are passed to task planner #157

Open tgpfeiffer opened 7 years ago

tgpfeiffer commented 7 years ago

In my custom planner I want to sort jobs by priority, but it seems as if I don't get all of the jobs in the queue passed to public Plan plan(Map<String, Offer> offers, List<Job> jobs), but only the first k ones, where k is (maybe) equal to offers.size(). Even if I have 10 jobs in state QUEUED, I will never receive all of them in plan(), but only the one or two oldest, therefore I cannot do any meaningful sorting. Is this intended?

kuenishi commented 7 years ago

Yes, this is simply to prevent millions of queued jobs selected from database to match with only 3 resource offers, for example. As you can find code around here, k results as number that is constrained as follows: total CPU and RAM of first k jobs in the queue < total CPU and RAM of all resource offers.

I see you need more scheduling flexibility - maybe adding a coefficient C that changes the constraint as follows: total CPU and RAM of first k jobs in the queue < C * (total CPU and RAM of all resource offers). Default is 1.0 and in your use case it may be 2.0, 16.0 or even more.

tgpfeiffer commented 7 years ago

I am not sure that the coefficient would solve the problem. Say I have one user with a special low-latency constraint; that user's jobs should always be executed as soon as possible, no matter how the queue looks like. That is not possible with whatever coefficient I choose.

Could we either do something like "if the coefficient is < 0, return all remaining jobs" (and it becomes the administrator's responsibility to watch the queue state) or somehow change the database query so that some ordering can already happen there?

kuenishi commented 7 years ago

Ah I missed your response, sorry. I see and agree that current interface of Planner is not flexible enough for some use cases, which needs all queued jobs. But just for your example case "Say I have one user with a special low-latency constraint; that user's jobs should always be executed as soon as possible", isn't super-high priority the answer for this? I still don't think it's worth allowing planners to run arbitrary query on queued jobs or I need some other good compromise.

kuenishi commented 7 years ago

I'm not against improving planner SPI, but I'd want to remove naive and priority planner (essentially old internal planner interface) and clean up the code before starting this work. This is because old interface includes new planner SPI and thus new planner SPI can't do anything that protrudes the old interface. Though, old interface is tightly coupled with Mesos protos objects like spaghetti...