Open beckettsean opened 8 years ago
Is there any timeline for these functions? (I'm very interested in the aggregate integral functions, to calculate kWh from watts). Seems that a feature request was opened a year ago.
There is no timeline for each specific function/feature. All work on new functions was on hold while the query engine was refactored, and that refactoring was merged into InfluxDB 0.11. We plan to introduce a few functions with each release from now on.
Would also be great to get #3633 Added to your list this feature would be very useful on the client side when graphing data.
Thanks, @timgriffiths, added!
Filed #6208 would like to see holt-winter and some build-in time series anomaly detection functions in influx QL. E.g. https://blog.twitter.com/2015/introducing-practical-and-robust-anomaly-detection-in-a-time-series http://robjhyndman.com/hyndsight/yahoo-data/
@dengliu can you open a new issue so I can reference it here?
@beckettsean I have updated the issue id in my previous comment.
@dengliu as referenced in the other issue, anomaly detection functions should go into Kapacitor. The results can be fed back into InfluxDB for visualization, but that doesn't require any new functionality in InfluxDB.
Awesome work. I'm +1 for #813 - I've got data in Influxdb that I'd like to run it on.
+1 for exponents and logarithms #659
The lack of a mode function is really hurting us in switching from 0.8.9. In particular, we need to be able to select the most common occurrence of a specific string over a group by, which we have to do in post-processing now.
@brandoncazander COUNT(DISTINCT())
will give you the frequencies for each period in the GROUP BY
, and you can pull the largest number from that client-side. Not a solution but perhaps a workaround.
The IOT stuff we are doing really needs integrals. Any idea on timeline of when we might get this?
+1 for lag (or some kind of per-series time-shift) so that we can show current and -1week data on the same panel in grafana.
something like: select mean(value), mean(lag(value, 1w)) from metric where time > now() - 1d group by time(1m)
would return two fields, being the per minute averages for the last 1d and for the same period but one week prior.
If there is already a way to do this, I would appreciate being hit with the clue-stick.
@sferrett That's an excellent suggestion. Would you mind creating a new issue for it so we can track that separately? That's a bit beyond the level of adding a simple function.
@gunnaraasen any slick Grafana tricks to accomplish the same goal with current InfluxQL?
Grafana has a per-graph setting to add a time shift (see the screenshot below). I don't think it's possible to shift a single series within the same graph.
+1 one for adding more math functions
From the top of my head the function would be:
Would it be possible to create some way to extend InfluxDB with custom functions? Maybe to load some dynamic library at load time? This could allow community to experiment with different functions?
@mitar Kapacitor supports arbitrary User Defined Functions in almost any language, and it can be used to batch-process InfluxDB data.
Hm, but to my understanding, Kapacitor would have to read the whole span of a time-series to be able to compute the custom downsampled version of data, every time I would try to read it at a lower resolution?
@mitar Kapacitor can store its results in InfluxDB, just like a CQ. Instead of running a CQ inside InfluxDB, you would just have Kapacitor do a batch query, process the UDF, and then write the results back into the downsampled measurement in InfluxDB.
But I do not want to do this for every query users come up with, at different timespans with different downsampling? I like InfluxDB because it can compute downsampled values efficiently without moving data around. Reading InfluxDB into another process and then storing it back seems inefficient to me? Or are you saying that there is not much performance difference between InfluxDB doing downsampling on the fly, and Kapacitor doing it? Using Kapacitor to downsample the same amount of data takes similar resources than using InfluxDB to do such a query?
@mitar my thoughts exactly. The goal should be (at least that's mine) to do as much as possible on the db without sending data over the network. I would like to store the data in the finest granularity available and the derive data from it. And then only send over the derived data to the client. In this (my) case drived data are not only aggregated time buckets (e.g. seconds built from ticks), but computations build on those buckets.
@mitar I cannot think of a way to allow for ad hoc queries using user defined functions. We do accept PRs for new functions, or you can perform the function calculations client-side for now.
@mrecht sending as little data around the network as possible is our goal, too, but opening up core InfluxDB for arbitrarily defined user functions introduces too many challenges around maintaining performance and stability. I don't expect UDFs to be part of InfluxDB at any time in the 1.x version line.
@pauldix are UDFs for InfluxDB anywhere on the timeline yet?
So what about allowing InfluxDB to load some dynamic code (.so, .dll) to do that? Many other databases allow modules which extend the behavior.
So maybe there should just be a way to load a custom module, and then you could say "module:foo" which would simply call that module's function as a function in InfluxDB?
UDFs aren't on the timeline yet. Best bet is to use Kapacitor for that for now.
@mitar @mrecht I encourage you to open a new issue for user-defined functions in InfluxDB. It's not that we think it's a bad idea and are saying no, it's just horribly complex and not a current priority.
Done: #6891
Really need SUM to support nested functions such as MEAN. See issue https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/influxdb/uXdYy9JA6_E/cZRzQo7FBAAJ. This is likely to prevent us from deploying InfluxDB. As noted in that issue though this is just one respect in which InfluxDB's query language seems sorely limitted!
In regards to @MatMeredith comment: this is the one biggest missing feature for us as well!
As to the issue #5150, you can get around that with a continuous query that does a
NON_NEGATIVE_DERIVATIVE(MEAN(value)), 1s) INTO
a new measurement and then pull a 95th or whatever percentile out of it. This is perfect for a 95th percentile of interface traffic calculation, but alas, Grafana cannot yet draw that as a line sice the percentile is really a point, not a series of values.
@beckettsean Can you please add #5345 to this? Its related to Top aggregate.
@beckettsean also #6723 to make it easier to query on specific patterns of time windows (e.g. Every day 9AM traffic pattern).
A good addition would be exponential moving average (EMA), useful to minimize the MA jumping when a large value enters and exits the window.
https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average
I was also a bit uncertain about the overlap between top
/bottom
and min
/max
- would be nice if the manual mentions if there is any performance difference between these two, besides the nicer syntax when your use case fits the min
/max
Histograms are for continuous numeric data. What about discrete non-numeric data?
Specifically, an example use case:
I have a measurement which contains some data of unbounded dimension (say, userID). I store it in a field to avoid exploding my series cardinality, and because I don't really need to query on this data for a specific userID. What I really want is to be able to see how many points occur in this series for each userID. This could be displayed nicely in a bar graph which shows, say, the 5 userIDs with the highest number of points for this series.
I apologize if this is already possible with current functionality, but I haven't found a way to do it without selecting the field (and thus all the points) directly from the measurement, and transforming it into aggregate counts on the client side.
@bal2ag: Yes, I need that as well. I opened something similar for RRDTool as well: https://github.com/oetiker/rrdtool-1.x/issues/261
We have received user requests describing the need for a latest
selector function: #7089
Can geometric and harmonic mean functions be also added to the list? They are useful for estimating non-normal distributions.
Casting was implemented however I think this fairly important use case was skipped out on:
select mean(value)::integer from firstmeasurement
Currently it's only possible to do the following: (2 steps)
select mean(value) into secondmeasurement from firstmeasurement
select mean_value::integer into thirdmeasurement from secondmeasurement
Is it possible to add support for this feature because it otherwise necessitates a CQ or the use of Kapacitor to get this feature.
When trying to do so now you receive:
select mean(value)::integer from firstmeasurement
ERR: error parsing query: found ::, expected FROM at line 1, char 15
Modulus division would be greatly appreciated.
ie, the % operator.
@beckettsean I would also like to see #6723 added to your list as well. This would allow for better retrieval of periodic or seasonal data. What's the processing for vetting this feature for your near-term roadmap?
@beckettsean Are there any plans of supporting MySQL pattern of SUM(IF(EXPR, 1, 0)) commonly used for conditional aggregating?
MariaDB [test]> select * from http_requests;
+----+-------------------+---------------+
| id | url | response_code |
+----+-------------------+---------------+
| 1 | http://google.com | 200 |
| 2 | http://google.com | 200 |
| 3 | http://google.com | 200 |
| 4 | http://google.com | 200 |
| 5 | http://google.com | 200 |
| 6 | http://google.com | 200 |
| 7 | http://google.com | 404 |
| 8 | http://google.com | 503 |
| 9 | http://google.com | 500 |
+----+-------------------+---------------+
9 rows in set (0.00 sec)
MariaDB [test]> select url, sum(if(response_code = 200, 1, 0)) as success, sum(if(response_code between 400 and 499, 1, 0)) as failures_client, sum(if(response_code >= 500, 1, 0)) as failures_server from http_requests group by url;
+-------------------+---------+-----------------+-----------------+
| url | success | failures_client | failures_server |
+-------------------+---------+-----------------+-----------------+
| http://google.com | 6 | 1 | 2 |
+-------------------+---------+-----------------+-----------------+
1 row in set (0.00 sec)
My original plan was to create a CQ where I could aggregate a # of requests made in a time frame to calculate aggregate successes and failures in a single measurement.
I know that Kapacitor lambda supports IF operator, however was hoping InfluxQL might natively support this. This has the flavor of #4619 except it involves supporting expressions.
Thanks for consideration!
I feel the point "dot-product #5095" is too narrow in this feature collection. That feature request originally talked about using arithmetic operations inside of aggregates in general, not only the dot-product in particular. The dot-product was just a specific example of the problem, i.e., SUM(a*b)
, but there are plenty of other similar operations like SUM(a/b+c)
or, in my case, the so called "micro_price" in an order book: MAX((bid*bid_vol+ask*ask_vol)/(bid_vol+ask_vol))
(or the low MIN): http://quant.stackexchange.com/a/24510/2792
I've started work on integral function. Are there any quants or devs watching that have an opinion about default # of slices or a good rule of thumb relating data points to slices?
Is anybody working on histogram()
? It is a crucial function for us unfortunately and doesn't seem to be the focus of any development work at the moment.
I hope the team will put some documents on how to develop such aggregations / operators :)
The aggregation function "Root Mean Square" RMS would be great.
+1 wish point to histogram()
+1 wish point to histogram()
This issue contains a list of related feature requests that are not on the near-term roadmap. The feature requests in this issue are all new functions that have been requested. If you want to request a function not already listed please make a comment on this issue, and we will add it to the checklist.
Aggregations
Selectors
Transformations
Operators
InfluxQL enhancements
fill(previous)
without GROUP BY https://github.com/influxdata/influxdb/issues/3633