raintank / crate-datasource

Apache License 2.0
13 stars 8 forks source link

crate internal 10 k limit cuts of graph #11

Open joemoe opened 8 years ago

joemoe commented 8 years ago

at the moment we hit the crate internal 10 k limit when want to visualize a table grouping by a field and going for seconds over a longer time span like an hour or something.

the graph then stops at one point. do you have an idea what we can do about this? e.g.:

bildschirmfoto 2016-08-04 um 17 38 55
alexanderzobnin commented 8 years ago

The main problem is that Grafana don't knows how many points will be returned. So we can check it only after handling request.

joemoe commented 8 years ago

I was thinking on the resolution. E.g. if we'd know we got a timespan from 11 to 17:30 which are 6 hours 30 minutes or ~ 23.000 seconds. it wouldn't make sense to get the second results and therefor it might switch to the minute resolution.

might this be a way to go?

alexanderzobnin commented 8 years ago

You can set Auto time interval which works similarly.

Node0 commented 8 years ago

I too will be testing this update, in the meantime, I've gathered some thoughts from our experiences with production data generated by our custom in-house analytics platform and stored in crate.

We've run into the 'R10K' limitation (rather severely) in our autoscaling scenarios. The problem has been so disruptive that we've temporarily switched to the ES datasource, which we cannot sustain going forward, as we'll need to use partitioned tables due to the sheer amount of data we generate. The crate datasource handles partitioned tables seamlessly, whereas the ES datasource cannot handle partitioned tables (from crate), even though it can handle autogenerated (by date) indexes from ES just fine (Similar backend mechanism, different datasource semantics and so the ES datasource chokes on crate partitioned tables), we generate a LOT of data.

I too thought time resolution switching might be a good idea, but then we underwent a launch event and AWS autoscaled to 30 www nodes, at a zoom 'window' of 7 days (to see trends) at 1,440 minutes per day (7*1440 == 10080 rows) for just one www node time resolution switching would fail to overcome the R10K problem as even a week's view at 1 min resolution would exceed 10k rows. Switching to 1 hour resolution to overcome this is suboptimal and would generate a graph with poor event detail definition.

Even a 1 hour resolution (at a zoom of 1 week) would be defeated if an autoscale scenario occurred where www nodes went from say, 3 to 60 ((724) \ 60 == 10080), admittedly that's a very high load scenario, however what if we wanted to zoom out to a month? The next unit of time is a day and that's only 30 units of definition per instance-graph, which is quite poor definition.

Here is an example of the kinds of queries I use: SELECT time,cpu_usage,instance_id FROM "doc"."serverstat" WHERE server_type = 'www' AND region = 'us-west-2' ORDER BY time ASC

The trouble with attempting to solve the problem solely via time resolution switching is aws autoscaling, or any situation where bursts of relevant data can suddenly flow in, and also that the 1 sec, 1 min, 1 hour, 1 day, etc units of time are not useful for a smooth zoom out progression from 1 week to 2 weeks to 1 month and so on.

What would be ideal would be the ability to collapse i.e. date_trunc() the timescale not by units of 1 minute or 1 hour or 1 day, but to collapse by N * 1 mins e.g. date_trunc('3 * minute', time) thus, upon reaching 1 week zoom out, the timeseries unit might be collapsed to 10 minute 'units' which would reduce the row count by an order of magnitude while providing perceptually detailed data to the user.

Joemoe, do you know if crate is capable of supporting this way of performing a date_trunc()? I'm asking for 'magic' I know, it would solve many problems without throwing the original data away.

If that's not possible then the 1 minute and 1 hour units might become the most common units as soon as time windows greater than a few hours are graphed.

A hybrid approach incorporating adaptive time unit switching, and also a count() aggregation query (in order to fetch the total number of rows for a given query in a graph) seems the most robust approach, as with the knowledge of how many rows are returned, the datasource can then divide any query which would return > 10k rows and repeat the query to crate with N number of offsets, each offset would be 10k rows.

Yes, this means that for every crate query is a graph panel, a count() agg query would first be required. Yes a little bit of latency (due to a few extra queries) is much better than "coarse grained" graphs or even worse, the 'R10K' problem of graph-data loss.

The hybrid approach would solve the problem of even extreme situations where AWS autoscaled a www node cluster to 100 nodes, the graph would be dense with data, however the datasource should be able to deliver the data and leave rendering duties to Grafana.

I was thinking that automatic offset-based query retrieval would have to be "always on" no matter what, but the time unit switching might be adaptive to a user-configurable parameter in the graph settings for the datasource i.e. Always display at least N data points (where a row is a datapoint) per item in a graph. So if the configured graph point density is 1,000, then upon exceeding 30 minutes at 1 second units, the datasource would do a count() agg query, compare the total rows to the configured data point param, and then truncate time units by N * 1 sec to bring the row count back down to just below the configured limit. This would have the effect of keeping the display's apparently datapoint density consistent, however I do not know if Crate is capable of arbitrary date trunc as The time unit could be switched to 1 hour upon a zoom window which exceeded 30 days or 1 month, as 1K data points per line per graph seems to be the best for preserving detail, upon zoom out to a 6 month timespan the time unit could switch to 1 day (we're not yet capable of storing 6 months of data in crate but with the switch to partitioned tables, we'll certainly try to do this).

This single bug-fix i.e. Eliminating the 'R10K' problem robustly, with little to no data-definition loss would stabilize the crate-datasource for real-world production use, (this is coming from a real-world production deployment).

Thoughts?

jodok commented 8 years ago

@Node0 we're discussing arbitrary group by intervals. as intermediate workaround, if the grafana query overrides the default limit of 10k by specifying LIMIT 20000 - would this help?

alexanderzobnin commented 8 years ago

This issue appears due database can returns different sets of series. For example, you request CPU load for all servers in group database for the last two hours and use minute resolution. This group can contains, for instance, 5 servers or 100 servers. In first case database returns 60 * 2 * 5 = 600 points and 60 * 2 * 100 = 12000 points in second case. And we cannot predict number of series to use corresponding resolution.

Node0 commented 8 years ago

Hi @jodok! As an intermediary workaround, I'd say providing a field in the datasource plugin configuration view where the user can configure the value specified e.g. 10k (default if no value entered), 20k, etc

@alexanderzobnin , I was thinking that a count (*) aggregation statement executed before the main statement-series (I say "statement series" as the count aggregation results would then be able to provide the number of rows necessary to determine how many iterations (each one with a different offset) of the original query statements to run).

Without that count aggregation query, there's no way to know how many rows will need to be fetched. So it becomes essential to first run a modified version of the original query whereby the select clause is replaced by the count(*) clause.

So it's a 'multi-pass' solution, first a count aggregation, which will provide the number of rows necessary to then loop through and request N number of versions of the original query each with an incremented offset (incremented by 10k rows each time).

This solution will handle any number of rows, it's not the fastest solution, however it is stable even when dealing with huge datasets.

To be honest speed in crate is a function of distributed compute, so the speed of the cluster can be increased by scaling. From a data acquisition perspective (from the Grafana side) it seems appropriate to adopt the most reliable way to get ALL the data for a given analytics query, and let the question of response time be left to the configuration of the crate installation (cluster size, replica count, shard count, etc).

Does this approach make sense @alexanderzobnin ?

alexanderzobnin commented 8 years ago

@Node0 yes, I think it's stable solution, so we can try this way.

jodok commented 8 years ago

@Node0 i agree - that's a really nice workaround right now. @alexanderzobnin let's do it that way and let's get out the beta version of the datasource on grafana.net as soon as possible.

joemoe commented 8 years ago

@alexanderzobnin did you work on this issue?

alexanderzobnin commented 8 years ago

@joemoe I've added workaround for this issue, so you can set Group By Time Interval to Auto in query editor and check result.

Node0 commented 8 years ago

Excellent, I'll attempt to test this.