Intelligent Rollups and Querying of Aggregated Data

jsternberg commented 8 years ago

Feature Request

The database should support more intelligent rollups and querying of aggregated data. Currently, the only way to rollup data is through manually setting up continuous queries and then manually modifying the select statements to query that data which requires the user to know which retention policies exist rather than it being discovered automatically.

Proposal:

It should be simple for an administrator to setup rollups for an entire database and users should not need knowledge of the rollups for them to automatically start using them. Using rollups should be automatic and performant.

Current behavior:

Rollups require an administrator to create a retention policy and a continuous query like this:

CREATE RETENTION POLICY "5m" ON mydb DURATION 0s REPLICATION 1;
CREATE CONTINUOUS QUERY mycq ON mydb BEGIN
    SELECT mean(*) INTO mydb."5m".:MEASUREMENT FROM /.*/ GROUP BY time(5m)
END;

It then requires the user to query the mean of a measurement like this:

SELECT mean_value FROM mydb."5m".cpu

If the server is not running when an interval should be calculated, that interval will never be run and the user needs to run that query manually. There is no way to automatically reuse the information in continuous queries to backfill data either.

Also, if data is written after the last calculation, it will never enter the aggregation.

It is possible to obtain partial data, but this involves telling the continuous query to resample more frequently than the default time.

CREATE CONTINUOUS QUERY mycq ON mydb RESAMPLE EVERY 1m BEGIN
    SELECT mean(*) INTO mydb."5m".:MEASUREMENT FROM /.*/ GROUP BY time(5m)
END;

This will obtain partial data every minute, but it will not be an active result of all of the data that is available.

Desired behavior:

Administrators should have easier commands to create rollups (optional since the commands above are fairly easy to write).

Users should not need to care about retention policies when trying to get the data they want. The above query the user should write is:

SELECT mean(value) FROM cpu GROUP BY time(5m)

This should use the rollup automatically if one is available and would return the same value as querying the raw data.

Along with using the rollup automatically, we would also include syntax to automatically select the appropriate precision interval to be used based on the time range or number of points requested. So if we have raw data that is retained for 1 week, 1 minute aggregated data for 2 weeks, and 5 minute aggregated data for 4 weeks and we said this:

SELECT mean(value) FROM cpu WHERE time >= now() - 1w GROUP BY time(auto)

This would automatically select the 1 minute precision interval because that is the lowest precision. If we scaled this query for the past 3 weeks, we would return the 5 minute precision level.

Use case:

Downsampling long term data into new retention policies and greater performance by precalculating certain aggregates for certain intervals. This was the original use case for continuous queries, but the current continuous queries are too cumbersome for this currently.

Documentation

Requirements Doc

pauldix commented 8 years ago

In your desired behavior section, it's a bit off. If the user specifies GROUP BY time(...) then they're asking for a specific rollup interval, which if we have is great, but that's not generally why the user is specifying the group by time.

More often what they want is this: they have a graph of a certain width in pixels and they want to draw a line. They have a start time and end time for the graph and they want the right number of data points to come back (in whatever rollup time precision makes sense) to draw the graph as quickly as possible.

This feature should introduce new syntax into the query language to let the user specify that they want the database to select whatever precision makes sense based on the time range of the query and the desired number of data points to draw a line on a graph.

jsternberg commented 8 years ago

That sounds like a completely different issue and I think we should make a new issue for it. New syntax for grouping points doesn't appear to have anything to do with intelligent rollups. We can hash out the syntax now and do that right now without any major improvements. I can describe my point more at length, but should I make another issue so we can discuss that separately so this issue doesn't get off track? I want to keep this issue as bare-bones as I can otherwise we will never manage to ship it.

The reason why I put that as the desired behavior is because I was specifically referring to the desired behavior of intelligent rollups and my problem statement doesn't mention anything about graphs and the width in pixels.

pauldix commented 8 years ago

That one part of desired behavior I mention is the entire reason for this feature. I don't think the two can or should be separated. That's the high level use case. If the user specifies a specific time aggregate, that's something else and something we already support.

If you don't feel comfortable talking about syntax then the GROUP BY time part should be eliminated from that section too.

The point is, the reason to create this feature is for that high level use case. It should absolutely be included. The syntax can be figured out later.

pauldix commented 8 years ago

My concern is that without thinking about that high level use case, the design of this thing could be done in such a way that creating that functionality would be difficult to impossible. It's important to note the real high level goals behind why we're doing intelligent rollups at all, just to make sure that whatever implementation we put out is in service of those user facing needs.

jsternberg commented 8 years ago

Ok, I updated the above issue to include a very broad overview of what has been mentioned and to focus on the Graphite use-case. We're going to include this more thoroughly in the requirements and design documents.

jsternberg commented 8 years ago

@pauldix a question about choosing a retention policy based on the number of desired points. Imagine I have two rollups. One is at 1m and one is 5m. I tell the query that I want the last 3d of data and say that I want a graph with 1,000 points.

With these rollups, 1m for 3d gives 4320 points and 5m for 3d gives 864 points. The second one is obviously closer to the number I requested, but this can get murkier if I chose something like 2,000 points. How should ties be broken?

pauldix commented 8 years ago

@jsternberg I think it makes sense to have the requested number of points be a minimum or a maximum and we chose based on that.

jimmys commented 8 years ago

How would tags work with the intelligent rollups being proposed? We make heavy use of tags in our InfluxDB deployment to monitor our microservices. All of our Grafana dashboards depend on using tag data to work.

For example, we have a view of HTTP request counts and a tag on the measurement for service name lets us visualize those. Another case is a count of users by platform - the measurement is user logins, say, and the tags give the platform. This lets us show a total as well as per-platform breakdowns.

We would need the rollups to handle this case or else the visualizations are not going to be useful for us.

pkittenis commented 7 years ago

Couple points to consider from the Graphite POV.

Example real world use case:

InfluxDB as back-end datastore to Graphite API
Retention period 5m with down sampled data from CQs with group by time(5m)
Storage finder has configuration to dynamically assign a group by interval of 5min for queries ranging >= 7d
InfluxDB storage finder for graphite api has configuration to use retention period 5m for group by intervals of 5min and above

User queries 7 days worth of data. Storage finder calculates a group by interval of 5min for the query duration which points it to retention policy 5m. Data is queried as select <..> from "5m".<measurement> where <..> GROUP BY time(5m)

This is real world use case with existing graphite API and influxdb storage finder projects.

Grafana with the InfluxDB API does the same group by interval calculation though it cannot dynamically choose a retention policy based on that interval, user has to select one.

In both cases a group by interval from range of query is what is calculated to get a desired max number of data points.

Data points returned by all 7d queries is constant and graphs are drawn quickly. Both Grafana and the Graphite API storage finder will keep increasing group by interval as date/time range of query increases.

From this real world use case, would it not make sense to have the DB automatically select a retention policy/intelligent roll-up data based on the group by interval? As a bonus, queries would not have to be re-written to support a new syntax.

If the DB has intelligent rollups of data in regular intervals, either dynamic or pre-configured, should it not use that rolled up data when a requested group by interval matches the roll up, or the rolled up data that is most near the interval if there is no direct match.

In short, in all graphing/Graphite use-cases where users define a specific time aggregate it is in fact in order to get less than a max number of data points for the date/time range of the query.

It would therefore seemingly make sense from a user's perspective for the DB to automatically use whatever intelligently rolled up data it has nearest that time aggregate. Some food for thought.

jsternberg commented 7 years ago

I want to just spend some time marking down some notes to help spark discussion on possible implementations of this. I also want to nail down the primary purpose of this since I think there might be too much conflated in this issue.

We have at least two different problems that may or may not be the same problem (and I definitely think they are related, but maybe not directly).

Automatically switching to use a retention policy that is filled by a continuous query.
Automatically adjusting the grouping interval depending on the desired time window (when zooming in and out of data visualization).

I don't personally think that these two are inextricably linked, but I do think they complement each other. For the second, we can determine a suitable grouping interval depending on the time interval requested. You don't need the alternative retention policy for this and it can be a one-off query, but it wouldn't necessarily be the quickest since it would always have to recalculate the values. But we can do it pretty easily.

There's also the first issue. While important, I don't think it's necessarily the same issue. I still think it should possibly be done at the same time since I personally find it to be more important. When you want to query a continuous query, you have to change the query itself to take from a new source and remove the group by clause in the query. This substantially alters the query and isn't particularly user-friendly.

For the first, we can try to do some kind of silent change of where we get the data from. When we see an interval and a call iterator, we can check the list of continuous queries for a database and see if we can find the same measurement being used as a source in a continuous query with the same interval, we can switch the iterator creation to just retrieve the data from the target measurement. This might have some additional problems, but it should be good enough for the most part.

For that though, we would also have to consider how to rerun continuous queries automatically when data is written to the source and how to deal with data in an active shard.

I think the first one is the focus of this issue if I understand correctly though. Thoughts?

I think my comment here may echo some of the points in the comment above, but I'm not 100% sure.

aderumier commented 7 years ago

@jsternberg : personnaly, I'm waiting more for 2 than 1. (I don't use continuous query ,but I would like to display graph like 3months, 1 year range , without need to retrieve all values. (I'm pooling each 30s with telegraf, with 1 year retention).

pkittenis commented 7 years ago

| 1. Automatically switching to use a retention policy that is filled by a continuous query.

Yes, this echoes what I was describing in earlier comment.

However, won't this still result in queries needing to be changed to read from different fields as CQs with aggregation will have changed field names? (prefixed with aggregation term when using wildcard CQs).

Note though that switching implies only using data present in the particular retention policy. What about newly added data not yet ingested by the continuous query populating that retention policy? How about missing data that was never processed by continuous query because the DB was down at the time the CQ was scheduled to run?

On large data sets continuous queries that run on all measurements can be several hours behind which is fine as they are usually storing historical data but it also means they will be missing several hours worth of data if they are the sole data source.

What would be ideal IMO is what is described above but instead of switching to, merging of data in retention policy matching/closest to interval plus any gaps in data or missing data taken from default retention policy. This may be wishful thinking though.

Just not sure the 'switch retention policy filled by CQ if there is matching interval' would result in a good experience, given how CQs are currently implemented which can result in gaps in data and missing latest data while CQ runs, plus the changed field names and all. At least the last part, field names, can be handled client side but missing data is missing data.

| 2. Automatically adjusting the grouping interval

While it would be easier on the user for the DB to do this automatically, presumably with a flag to turn on/off, existing clients already do this so purely from the Grafana/Graphite PoV it would not add significant value.

jsternberg commented 7 years ago

Just not sure the 'switch retention policy filled by CQ if there is matching interval' would result in a good experience, given how CQs are currently implemented which can result in gaps in data and missing latest data while CQ runs, plus the changed field names and all. At least the last part, field names, can be handled client side but missing data is missing data.

Yes, I've been thinking about that a lot and it's a good point. The current method for how continuous queries work would have to be rethought. One idea that I had is having a way to know when the last write to a shard was. That would allow this behavior to work when dealing with cold shards (not receiving active writes) and it would query live data when querying an active shard (or keep a buffer that was described in the original proposal for this by @pauldix). So while it sounds simple to just say, "switch where we get the data from", it's a lot more complicated than that. If we managed to do that though, it would improve both this experience and the current experience of those frustrated with continuous queries and their current shortcomings.

However, won't this still result in queries needing to be changed to read from different fields as CQs with aggregation will have changed field names? (prefixed with aggregation term when using wildcard CQs).

That's pretty easy to do behind the scenes. The difficult part would likely be the backwards index that would be necessary. CQs will take a single source and have a target of who they write to, which is easy. But we would need to know which CQs refer to which source and have a fast and easy way to match those with the original query. That's less straightforward. It's possible using an O(n) search of all continuous queries or limiting it to just the current database, but you can technically have a CQ that reads from one database and writes to a second database with that second one. I'm also not comfortable with an O(n) search on every query to find the appropriate database/retention policy to read from.

I think I didn't explain myself well in the last paragraph so if I need to clarify, please just say so.

pkittenis commented 7 years ago

That makes sense, thanks for clarifying. Sounds like the new functionality is indeed intending to merge both dataset filled by CQ and real time dataset which is ideal.

What is not clear to me is whether or not gaps in aggregated CQ-filled data can be filled by data from default RP, if present. Gaps meaning periods of missing data in between data in the RP where CQ did not run or did not run successfully for whatever reason. Could you please clarify?

Regarding matching CQs, at least for the wild card in CQ case it can be safely said that the CQ is matching all measurements, so matching is not needed. For non-wildcard CQs yes, an index will be needed.

A simple measurement in CQ -> CQ target for measurement mapping should suffice and be O(1) for queries looking for CQ targets of a measurement, though not sure how memory intensive that will be on large datasets.

jsternberg commented 7 years ago

Just to make sure I clarify, everything I said is currently brainstorming of what I would like to see and discussing the feature. I'll admit that my current ideas are a bit lofty so while we'll try our best, there are no guarantees that everything I say will be the end result of what we do.

My idea for gaps in the CQ-filled data was partially related to the hot/cold shard idea. A hot shard will end up being defined per-CQ rather than just the shard being hot or cold. So for a specific CQ mapping, the shard is hot if the shard has been written to more recently than the CQ has run. We might also want to have a cool off period so we aren't switching between hot or cold and to also allow the CQ to not be running constantly. It would likely not be feasible to have the CQ running constantly on shards that are currently under heavy write load anyway. An in-memory buffer would work better if we want to do those anyway.

Under that idea, that means that old shards (which we presume wouldn't be actively written to) would commonly have the CQ for the entire interval run and then have the data written by the CQ be queried instead. If you then write some historical data, the shard may become hot again and it would query the actual shard rather than the retention policy written to by the CQ. That is, until the CQ could be run again. We don't want to be too aggressive in running CQs because it is reasonable to believe that a person writing data to one shard may write more data to that shard shortly.

I don't think the index will be too big of a deal personally. It's just necessary to say that it will probably be needed. A lot of what I'm writing are also personal notes for myself.

mscso commented 7 years ago

Isn't this the most basic and most important feature of a metrics database? I was looking at influxdb and i'm shocked that this is not part of release candidate 0.0.1. Honestly this is what rrdtool has been used for forever and if I just wanted to store all the metrics individually without roll-ups, I'd just write them to flat files.

Really hoping InfluxDB decides to prioritize this feature.

lfarkas commented 7 years ago

Is there any progress with this issue? Currently it's not possible to do a longer term downsampled data in influxdb:-(

pauldix commented 7 years ago

This feature is high priority, but making it work at scale without killing performance is very tricky. In the meantime our recommendation is to use Kapacitor to aggregate your data into other retention policies like in this example: https://docs.influxdata.com/kapacitor/v1.3/guides/continuous_queries/

DonDebonair commented 7 years ago

I know that many people in the comments here have already tried to voice what it is exactly what they expect, but I feel it's often not clear enough, I think. To me this issue basically touches upon 2 lacking functionalities in InfluxDB that currently stop me from using InfluxDB:

When I query some table, InfluxDB should be able to automatically select data from different related tables that were the result of downsampling using CQs, and combine that data. Graphite does this for me, and that is what still makes Graphite great. I'll illustrate:

Let's say we're storing some data about cpu usage in InfluxDB. We storing data every second. But in the long run, this fine granularity is not needed for us to get a clear picture about how our cpu behaves. So now we create a Continuous Query that downsamples the data that is older than 5 minutes to 1 datapoint per minute. We create another CQ so that all data that is older than 1 hour is downsampled to 1 point per 5 minutes. When I query InfluxDB for this data, I don't want to have to care about which tables I need to select and combine. I just want to be able to select on the original table, and have InfluxDB do 2 things:

Combine data from the tables that are filled by CQs, with the data from the original table
Intelligently choose the finest granularity that is still supported by all the tables that need to be selected within the required timewindow. In the above example, that means that when I select data for 1 day, InfluxDB will combine the original table, with the data from the once-per-minute rollup (older than 5 minutes), with the data from the once-per-5-minutes rollup (older than 1 hour), because all that data is still within the window I requested (1 day). InfluxDB should also infer that the finest granularity it can return, is data per 5 minutes, and group by that, because it knows that is the finest granularity supported by all the selected rollups.

So TLDR: user creates rollups, upon querying, InfluxDB can automatically combine data from original table, with rollups, based on the requested time window, and make sure data is grouped by a window that is supported by all rollups.

When I group by time window, I should be able to tell InfluxDB how many datapoints I actually need. This is somewhat related to issue 1, but not necessary perse to implement 1. To illustrate why this feature is handy: in the aforementioned example, when I select data for 30 days, when InfluxDB detects that the finest supported granularity in all the rollups for those 30 days day, is 5 minutes, we end up with (24 * 30 * 60) / 5 = 8640 datapoints. This might still be way more than is needed do draw a proper graph in Grafana. So I should be able to "downsample" in my query, to return less datapoints. Note: this is different from the automatic grouping that InfluxDB should do as per issue 1, to support the finest granularity as supported by all rollups in the queried window. Also, this "sample granularity" should be provided by the client (ie. Grafana), because only the client knows how it's going to display the data. If we zoom in in Grafana, it should fire a new query, for a new time window, while probably asking for the same amount of datapoints.

Issue 1 more important to me, and is beautifully done by Graphite, so I don't think it's a stretch to request that InfluxDB supports this to. Issue 2 is less important to me, but still necessary to guarantee better performance in frontends that visualise data from InfluxDB (like Grafana).

Maybe both issues should lead to a query like this:

select mean(value) from cpu group by time(auto) sample for 10 per 1d

When doing this query, InfluxDB should:

retrieve data from all relevant rollups (in this case all rollups for cpu, because no time window is specified)
combine the data
find out the finest granularity present in all the rollups (let's say 1 point per 5m)
downsample data from rollups (or the original table) that have a finer granulariy than 5m to 5m.
downsample more so that we're left with only 10 data points per 1d

time(auto) relates to issue 1 and sample for 10 per 1d relates to issue 2

My 2 cents, hope this helps!

hanej commented 7 years ago

Exactly what @DandyDev said. This is the only leg up Graphite has over InfluxDB and it's a major one.

Sineos commented 7 years ago

In addition to the excellent write-up by @DandyDev, I would like to bring in the aspect of roll-up / compaction planning.

Finding the sweet spot: Planning the compaction interval always is a trade-off between performance (wrt query speed, visualization speed, storage requirements, etc) and precision. The interval (especially irregular intervals) at which a certain measurement is ingested into InfluxDB and the volatility of the data itself affects the impact on precision vs. gain in performance. As a user I would appreciate if InfluxDB would support finding the “sweet spot”, i.e. offer the possibility to automatically calculate different scenarios and provide figures regarding the impact. For example a downsample to 5 mins could result in 20% less points with a precision loss of 7%. A downsample to 8 mins could, for the same data, result in 30% less points with a precision loss of 7.5%.

Harmonization across different measurements: When trying to compare, or (when implemented) do math across different measurements, it will make sense to harmonize the data to the same sampling interval. So with the same logic as above, it would be appreciated if InfluxDB as well supports finding an optimized harmonization interval across multiple measurements.

DonDebonair commented 7 years ago

@Sineos agreed that it would be nice if InfluxDB could help choosing the right rollup intervals and downsample granularities, but for me personally that's really a nice to have. I think the first step is to build upon the already existing Continuous Queries for automatic rollups, so that we can query those rollups in 1 go, and manually specify downsampling on top of that. A next step could be to have InfluxDB infer/suggest automatic rollups

matejzero commented 7 years ago

I gave InfluxDB a go once again today, to check if I could migrate our Graphite server to InfluxDB and hit issue number 1. This really is nicely done in Graphite and my users don't need to do anything to get highest precision metrics for the timespan they are interested in.

As for number 2, this is also something that is already implemented in Graphite and I think Grafana sends a parameter with query telling Graphite what is the max. number of points it should return.

drb-germany commented 7 years ago

I wanted to comment on why I think this feature is important by briefly describing our use-case for the InfluxDB:

We are a company writing software that reads industrial measurement data from several devices and aggregate+store this data in a database. Up to now, we have used a round-robin database with different granularities: the data is always collected second-wise and continuously aggregated into larger bins (5s, 20s, 60s, ...). Sometimes, we want to see the data trend for several years and then zoom in to see details of specific events (obuiously, details of very old events are slowly being lost). We are generally saving several 100 datapoints every few seconds (of which we select a few for visualization).

Now we are investigating to use InfluxDB and are very happy with the overall performance and features. Since the original data is stored at about second granularity, the creation of a one year trend using Grafana already aggregates the data on the server with the right "GROUP BY" command, so that only mean-hourly data is returned from the server to the client. However, the aggregation on the server obviously takes a huge amount of time (several minutes) since several 10-million datapoints have to be averaged into ~1000 resulting values. Since space is not really an issue (we can keep the original data for several years) it would be really nice to have an automatic aggregation of ALL data without the creation of additional databases and/or request queries into several time-bins. I know that automatic aggregation of all data can be realized with continuous queries, but this results in a new database and all visualizations have to be created twice (or more depending on granularity) using a different database.

I hope this gives a view from the user-side and explains why this feature is the go- or no-go criterion for us to use InfluxDB in industrial measurement environments.

johngilden commented 7 years ago

I don't think complicated use-cases are necessary to understand why this should be the number-1 priority. When I deploy InfluxDB + Telegraf + Grafana to graph some basic system metrics for my systems, I usually set the RP duration at 1 week. There is no easy way for me to quickly

Roll-up all of the different measurements (I usually have 10 Telegraf plugins per host, differing per host)
Tell Grafana to pick the closest rollups in terms of interval automatically

So I'm stuck with choosing a safe period for the retention policy and not having any data older than that.

mattwilmott commented 7 years ago

Agree with @johngilden, monitoring more than a handful of hosts blows out the number of measurements collected, ie mysql, mongo, system, kafka etc etc. 10 plugins as mentioned is easily the average and in some cases its 2 or 3 times as many.

All we want is the ability to set the granularity per timeframe and it is not feasible today so as John mentioned we just set retention once to a 'safe period'.

Don't even mind if its targeted at Telegraf metric usage for instance, by limiting the scope initially and making some progress may offer insight for the more generic case. It also helps mature the TICK stack as a whole, today its severely hampered by this issue.

bassebaba commented 7 years ago

Eh guys. I just started using grafana & influx. 0->100 in 1 weeks time. I just today hit the wall with this (my DB grows like a monster).

I have no clue what to do? As @johngilden says, Im using"Telegraf" and other loggers, mostly "configs" I find online and just "install", so I have more or less no knowledge of the data it stores.

So, in short, I already have hundreds, if not up to a thousand "measures", with yet again ten or hundreds of "fields". How am I supposed to write CQ to downsample that? It will take me several 40 hr weeks just to do that? More or less a "list" and then "cut n paste" to create queries for everything in all databases?

I need something on a database level to say like "all data older than 7 days should be aggregated by 60 mins". And then it needs to be queried together with the non-aggregated data. I guess this is what's referred to as:

"Users should not need to care about retention policies when trying to get the data they want."

in the original post?

I.e I want one single simple query that gives me both the 7 days unaggregated data and then also all of the aggregated, so that I with one single query can plot my graph in grafana.

Is what I'm asking even possible today with some sort of workaround?

pauldix commented 7 years ago

@bassebaba for now the best solution is to use Kapacitor to aggregate everything in the DB to another retention policy. Then use a Grafana template variable to have retention policy be selectable.

This is an important feature for us, but it's also devilishly hard to do without absolutely killing performance. It's still on the roadmap, but will take us some time to get to.

bassebaba commented 7 years ago

@pauldix Ok, thanks. I understand, it's my own fault.

One more question tho, the Kapacitor link you gave me still shows:

query('SELECT mean(usage_idle) as usage_idle FROM "telegraf"."autogen".cpu')

Byt that's one value (usage_idle)? I have thousands of them, do I manually have to write a "batch"-query for every single one of them in Kapacitor?

Since I'm only about 1-2 weeks into the world of time-series, I don't know how to better explain, I want to do something like this:

batch
    |query('SELECT mean( * ) as usage_idle FROM "*"."autogen".*')
        .period(5m)
        .every(5m)
        .groupBy(*)
    |influxDBOut()
        .database($value_from_*_above)
        .retentionPolicy('autogen')
        .measurement($value_from_*_above)
        .precision('s')

I.e, one single command to just aggregate everything i put into my databases, is that possible?

sebasampaoli commented 6 years ago

@pauldix do you have a deadline for this issue?

sebasampaoli commented 6 years ago

It's a very important feature for us too. Today we don't have any other solution other than replicate the same dashboards for the different RP.

ashuw018 commented 6 years ago

@pauldix Any rough idea that which release of InfluxDB coming with this feature? We are waiting sinnce long. It will be a milestone feature and much appreciated too.

Thanks for all you efforts.

pauldix commented 6 years ago

@ashuw018 we don't have it underway yet. There's a bunch of API work that we need to do in the beginning of this year so I suspect that it will be a little while. Unfortunately, this is a very hard feature to get right without tanking performance of the DB so it'll be quite involved.

brettwooldridge commented 6 years ago

@pauldix Irrespective of this individual feature, what is the current best practice for retention policies/CQ for databases with a large number of series?

For example, our database has hundreds of thousands of series.

If we want to downsample all series data after 6 months into 5m rollup, and downsample all 5m series data after 1 year into 1hr rollup, what is the best practice?

If we want to preserve the measurement names and tags, would we just be looking at a wildcard query with GROUP BY into a different retention policy?

eWilliams35 commented 6 years ago

I've been watching this thread for awhile, and I'm back to hitting a wall with this myself. We're writing a fair amount of data, 1 million writes a minute give or take, and we're trying to down-sample everything into 1m / 5m / 15m / 30m / 1h / 1d buckets.

We followed the recommendation given and tried using Kapacitor to fire in the down-sampling queries., but found that hitting Influx with daily rollups for 4-5 decent sized databases at the same time ate RAM like candy and destabilized our cluster. Even with just the 1 minute rollups enabled we were having RAM shortage issues. Under high memory conditions, we see all sorts of socket connection issues / timeouts / write errors in the logs, as well as the hinted handoff queue filling up as the cluster slowly dies.

We moved back to running all the CQs on the database, and found that it falls way behind due to the sequential nature of how the servers run CQs. We've seen one minute rollups getting run for a period of 45 minutes because the other queries in line in front of it take so long to complete. We've got 2 data nodes with 4cpu / 64 GB ram each, and we're hitting OOM issues on the node that's attempting to down-sample.

Thinking about moving back to Kapacitor and trying to manage the offset of the batch query so that each query fires at a slightly different time makes me sad, but I'm not seeing any other ways to make this work.

mscso commented 6 years ago

I think at this point it's clear that influxdb is more of a "nice quickstart" approach for small team metrics, but nothing sustainable for actual bigger deployments and the long run without significant wrapping / glue / tooling efforts around it.

DoGlatemLive commented 6 years ago

Thats what we actually did. We are now using InfluxDB only for Team Metrics as it features a neat API. Using CQs and RPs is not really an option for us, as we visualize everything via Grafana. I can't expect everybody to select the bucket from which they want the data.

Now we are using Graphite again for storing server/monitoring metrics (still gathered by telegraf) and InfluxDB for various little metrics gathered by the dev teams.

Visualization with Graphite in Grafana is way faster/smoother. Even though the server is way smaller than the InfluxDB setup. Also aggregating data is easier and more flexible.

krokodilerian commented 6 years ago

@ewillia1983 , this might be of little help, but for me I've found that having the roll-up queries run in 4 threads (so there would be no more than 4 running and the same time) and having them decide the ranges of data they would process is a workable solution (my use case is with ~100 databases/300 measurements in total, ~1TB of data, and the normal CQs were lagging for more than 8 hours before).

bedrin commented 6 years ago

@ewillia1983 I would try adding something like Apache Kafka for writes to InfluxDB and implement downsampling using Kafka streams. It's supported as an output by telegraf so shouldn't be a rocket science to integrate

eWilliams35 commented 6 years ago

@krokodilerian - can you share the setting you changed to multithread the CQs? I have not seen that option anywhere in the documentation. It does appear to fire 3 or 4 at once, but it's still too slow.

@bedrin - that's our last resort option, pre-aggregating the data before inserting. You raise an interesting point in regards to re-reading the raw data kafka topic and doing the math within the poll loop, then writing the aggregate to the desired retention policy.

krokodilerian commented 6 years ago

@ewillia1983 , I wasn't very clear, sorry, it's an external python script (https://gist.github.com/krokodilerian/8434e15248d6da7d8947bd2935bdb3fe) that i wrote do this, and it was definitely able to deal with our kind of load, but YMMV. Also, it works for me, might not work for you (and has the roll-up queries somewhat hardcoded).

DonDebonair commented 6 years ago

Not trying to be an ass here, but having to include Kafka in your setup, just to do proper rollups, seems a bit silly, doesn't it? This is something that Graphite has supported for ages, including seamless querying over different rollup periods. I'm not saying it's easy to implement for the InfuxDB team, but it might be a good idea to start looking at how Graphite does it.

pkittenis commented 6 years ago

FWIW, it's not rocket science to have queries use a particular RP depending on the date range of the request, as long as the query is not manually written of course.

Some third party tools do this already to enable Graphite queries over Influxdb with transparent down-sampled data, aggregation and so on - eg InfluxGraph.

Actually writing the down sampled data without having it fall behind (from CQs) is a different matter. Best solution so far seems to be what @krokodilerian suggested - thanks for the tip. The only issue with that is that the multiple queries at a time can put a lot of load on the DB, particularly when the down sampling queries are run against large databases, meaning a large amount of data points to process, million+ measurements and so on.

ivanvanderbyl commented 6 years ago

For anyone who is interested, the code for this in InfluxGraph is https://github.com/InfluxGraph/influxgraph/blob/7be6d2aa7bf7e7c516c25216a024ca1026c1c2ed/influxgraph/utils.py#L54-L87

CAFxX commented 6 years ago

If the server is not running when an interval should be calculated, that interval will never be run and the user needs to run that query manually. There is no way to automatically reuse the information in continuous queries to backfill data either. Also, if data is written after the last calculation, it will never enter the aggregation.

influxdata / influxdb

Intelligent Rollups and Querying of Aggregated Data #7198

Feature Request

Documentation