grafana / falconlogscale-datasource

Falcon LogScale data source for Grafana
Apache License 2.0
3 stars 0 forks source link

Dashboard auto-refresh lead to heavy query loads on backend #316

Open fyang13 opened 2 months ago

fyang13 commented 2 months ago

What happened: Heavy query requests were sent to backend when auto-refresh set in dashboard. (This was originally submitted as discussion in https://github.com/grafana/falconlogscale-datasource/discussions/309)

Currently the query going through FalconLogScale datasource plugin are submitted as static query, and in a real environment, user often set the time-range in dashboard, and enable auto-refresh, this can impose very high load on LogScale backend, since the auto-refresh will trigger the entire query resubmitted, with different time range(e.g: 30s refresh will shift the start and end time by 30s), this is considered a brand new query in LogScale backend. We are seeing backend query job grow exponentially when just a small set of users are using Grafana dashboard with some panels have LogScale panel with auto-refresh enabled.

Falcon LogScale do support live query, it does so by breaking down a query to historical query portion and live streaming portion, when historical query part complete, the query only need to get the events from ingest queue with new data coming in. Ideally it would be much efficient to leverage that if auto-refresh is enabled(at least when a small interval is set). I would assume supporting that might be a complicate change and it could take time to implement, so alternatively, it would be nice to simply add an option in datasource config to ignore dashboard auto-refresh setting, and with an indicator in the panel so user's aware the auto-refresh is disable for the panel that's using that specific data source?

What you expected to happen: Handle repeated queries more efficiently, or, allow data source be configured to ignore dashboard auto-refresh setting.

How to reproduce it (as minimally and precisely as possible):

Screenshots

Anything else we need to know?:

Environment:

aangelisc commented 1 month ago

Hey @fyang13,

Just wanted to follow up on this. Atm there is no way for a data source to ignore a query that has been triggered due to the auto-refresh property of dashboards.

Have you experimented with increasing the auto-refresh interval beyond 30s to attempt to reduce the load? Also, is the auto-refresh functionality something you require on your dashboards?

I'm uncertain if the live query functionality will alleviate this problem as my understanding is the live query aspect is for data currently being ingested and is already a part of how LogScale handles querying.

Let us know what you think!

fyang13 commented 1 month ago

@aangelisc Thanks for follow-up. Setting auto-refresh beyond 30s would help, but the problem is user would normally create a dashboard with panels from different data sources, and they can set any auto-refresh interval that we can't control. And even with 30s interval, it could be still very heavy, since the query sent to datasource at auto-refresh just shift the start/end time of the query by 30s and that's considered an entirely new query in LogScale. In an environment with many users and dashboards, this can have a compound effect making the queries sent to backend to be prohibitively high quickly.

If the auto-refresh can be somehow switched to using LogScale live query, it should help. While live query is handled by LogScale backend, it require the API client(the plugin) handle it differently, the query need to have "isLive" set to true in the request, and new data will be streaming to the client, instead of just polling query jobs. The current plugin version is sending static query only to backend as far as I can tell. This would of course it a bit more difficult to implement, and somehow user would also need to be made aware if dashboard auto-refresh is enabled, then data from this panel will be streaming in live, while other datasources might be polling in fixed interval. In some case, a query can't be run as live, user should be made aware, that should probably be covered in the enhancement request here?

Alternatively, if datasource can't be made to ignore query being triggered from auto-refresh, I'm wondering whether an option in panel UI can be added to allow the panel ignoring auto-refresh setting in dashboard?

saheemg commented 1 month ago

@aangelisc can you please replu to Fred last update?

aangelisc commented 1 month ago

Hi @fyang13. @saheemg,

Apologies for the delayed response. Not much to update here at the moment. Unfortunately data sources aren't aware of if a query is due to auto-refresh or not so it's not currently possible to ignore auto-refresh queries at the data source level.

Additionally, it's not possible to mark a specific panel as excluded from auto-refresh.

Is there no configuration in LogScale to potentially rate limit queries issued?

We are investigating other potential solutions and will update the issue when we have more information 😊

To help reduce the impact of the issue atm it is possible to set a minimum refresh interval in your Grafana config that dashboards will have to adhere to. If you check this documentation you will find details of this functionality.

fyang13 commented 2 weeks ago

@aangelisc Any update on potential solutions you have been looking into? Regarding your question on query rate limits. Yes, we can rate limit the queries in LogScale(and we did), but the auto-refresh here is essentially drop the query just executed(still in query coordinator memory) and reissue another one with very little difference on the data set, it's a very inefficient way to consume LogScale resource. The rate limit end-up limiting user's queries often when it shouldn't have to.

bossinc commented 2 weeks ago

@fyang13 I looked into using Grafana live as a replacement to auto refreshing queries. This should work with CS's Live Search Request

fyang13 commented 2 weeks ago

@bossinc That sounds good! I assume it would require some work on the plugin to make it a streaming data source, as well as an option to enabling live query to the backend?

bossinc commented 2 weeks ago

@fyang13 It does require work on the plugin. Once we implement plugin streaming, you can choose to run the query normally or start streaming the results.