sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.11k stars 1.29k forks source link

Multi-Queue Fairness #50642

Closed Piszmog closed 1 year ago

Piszmog commented 1 year ago

With #50613 complete, the Jobs will be provided in a first-in-first-out order. This is not sustainable as some Jobs may take longer than others, or there may be too many Jobs of a single type that prevent other Jobs from being ran in a reasonable amount of time.

We should implement "fairness" that could take into account the following,

Done

Technical Direction

Requires

sanderginn commented 1 year ago

The fairness algorithm is an interesting challenge to solve. Some things that come to mind:

It's probably easy to over-engineer a solution. Do we have access to job metrics from large scale customers? I can imagine it'd be helpful in establishing some edge/corner cases that we need to take into consideration.

Piszmog commented 1 year ago

Should batch and codeintel jobs get a 50/50 share of execution time?

Maybe to start with? When an instance starts fresh, I think it will have to default to this behavior. I also wonder if this execution time percentage should be configurable.

We are also talking about adding another queue (packages). So this share will decrease for all queues.

Do we want to allow users to push batch jobs to the front of the queue?

I think this is a good idea. But maybe just Site Admins would have this ability.

Is it feasible to estimate job execution time accurately enough (perhaps based on historical data)?

This is one of the squishy things, to me. Historical data will be key, but also seems like such a cloudy thing. A random batch spec script could shoot execution time up substantially. Part of me wonders if an AI would be helpful here (lol).

It's probably easy to over-engineer a solution. Do we have access to job metrics from large scale customers?

Oh yes, super easy to over-engineer. I do not think we are collecting any metrics (like execution time) today. So this work may set us up to collect this information.

sanderginn commented 1 year ago

Proposed solution (some arrows refuse to render properly)

Image