Merge identical queries in the scheduling queue

bboreham commented 1 year ago

Is your feature request related to a problem? Please describe.

Currently, if we receive two or more identical queries, we do all the same work for each of them. This might sound rare, but gets more likely as more people in a company are looking at the same dashboard.

Describe the solution you'd like

If we detect two identical queries going in to the scheduling queue we could merge them and just do the work once.

It's possible that we can fetch most of the result from cache, but many requests are not cached and we don't cache data newer than 10 minutes so queries up to "now" will involve work.

(Also applies to series requests, labels, label values, etc.)

Describe alternatives you've considered

Leave it as-is.

Additional context

We have something like this in store-gateway with the expandedPostingsPromise.

Credit @pracucci who mentioned this idea to me yesterday.

pstibrany commented 1 year ago

I like the idea, just adding few notes:

"if we receive two or more identical queries" -- do you mean identical start/end times too? I guess that would lower chances of finding identical queries.
Request sent to query-scheduler (FrontendToScheduler) has a frontendAddress and queryID. These are used by querier to send the result back to frontend. If we merge multiple requests, querier will need to send results to multiple frontends (with different queryID for each frontend)
Results cache is consulted before request is passed to query-scheduler. Queriers don't use results cache today (but ofc that can be changed)

pracucci commented 1 year ago

"if we receive two or more identical queries" -- do you mean identical start/end times too? I guess that would lower chances of finding identical queries.

Range queries are aligned by Grafana (to make query results cachable too). I think this idea could still be effective to cover the case many users keep auto-refreshing the same dashboard.

dimitarvdimitrov commented 1 month ago

this duplicate issue has some details on caching and consistent routing of queries to schedulers https://github.com/grafana/mimir/issues/6642

grafana / mimir