Shard query splitting: use dynamic grouping

matyax commented 1 month ago

Data is going to be distributed exponentially in shards, but there is no heuristic that can suggest, based in the shard identifier, what the approximately volume is, so there is no safe way to group shards in a way that prevents querying a huge amount of data. In addition, once the groups have been stablished, other than retrying, there is no way of changing the original group definition, and the query ends up taking a lot of time or failing.

To improve this, a new shard grouping approach has been implemented, where the groups are dynamically generated based on the query execution time that is returned as meta data, with the following method:

Initial group shard: Math.sqrt(shards.length). Square root will be significantly small for low or high number of shards, without being 1 (semi-optimistic start).

Then, when data is returned:

If it's an empty response, increase the group by 1
If the execution time is less than 1 second, double the group size
If the execution time is less than 6 seconds, increase by 2
If the execution time is less than 15 seconds, increase by 1
If the execution time is more than 15 seconds and less than 20 seconds, decrease by 1
If the execution time is more than 20 seconds and less than 30 seconds (timeout), halve the group size
if the query timed out, groupSize = Math.floor(Math.sqrt(groupSize));

When locally testing, more debug data is available by setting localStorage.setItem('grafana-lokiexplore-app.sharding_debug_enabled', '1').

gtk-grafana commented 1 month ago

Haven't had a chance to take a close look yet, but this looks very promising. My only suggestion off the bat would be to use a local storage variable for debugging instead of a const

matyax commented 1 month ago

Great idea. You can enable it by running localStorage.setItem('grafana-lokiexplore-app.sharding_debug_enabled', '1').

matyax commented 1 month ago

Should be safe to merge, and I really look forward for people trying this. @gtk-grafana If you want to take a post-merge look and you find anything, just let me know and I'll follow up.

grafana / explore-logs

Shard query splitting: use dynamic grouping #814