Open srfraser opened 9 years ago
:+1:
I work with sensor networks and find this limitation frustrating. For example, I wish to compute weighted averages like this:
SELECT sum(oxygen_percentage.value * flow_rate.value) / sum(flow_rate.value) FROM oxygen_percentage, flow_rate WHERE site_id = '3'
But InfluxDB returns nothing. Even SELECT oxygen_percentage.value FROM oxygen_percentage
doesn't work. Using 0.9.3-rc1 master (0163945).
Same here. I'd also like to calculate values across different series like:
select * from mysql_value where type='mysql_commands' and type_instance='show_tables' + select * from mysql_value where type='mysql_commands' and type_instance='show_databases'
Cheers, Szop
same as @hexluthor, I feel this is very limiting: if we need to correlate data coming from various sensors we currently have to write all data as fields in the same measurement... But would it be a good idea in terms of data structure to have a single measurement with more than 50 fields? Will it impact query performance? And this sensor data does not always get logged with the same sampling frequency, so this is not always possible to combine data in the same measurement if we want to keep data with high sampling frequency.
I'm not comfortable with distorting the data structure (dropping natural data organization) because of technical limitations. In the sysadmin world, it would be like putting all the cpu, ram, disk, and apache response time metrics in the same measurement for the sole purpose of being able to correlate apache response time with cpu, ram, or disk metrics.
Also, what are the actual technical issues that prevent InfluxDB to support queries with simple math operations across measurements?
This was recently changed to a "feature request" so that means it will be evaluated in future releases if we are going to add it or not. There are a couple work arounds right now, and that is to save a calculated field when you write data, such as storing another field for oxygen_percentage.value * flow_rate.value
. I understand this isn't ideal, but it might get you moving forward.
Otherwise, I think these requests are sane, but they will take some work. I believe sum() / sum()
is supposed to work already, but I thought I remember seeing a bug about math still not behaving properly.
@corylanou about the work around you're talking about: the oxygen_percentage.value * flow_rate.value
field should be created when new points are created or is there a way to compute the calculation afterwards in a continuous query?
Yes, I believe you should be able to do that in a CQ and then you can select from that retention policy.
How can we do it in a continuous query? I thought the syntax of normal queries and continuous ones was the same, so if it's possible in one, it should be possible in the other.
instead of sum(value & value)
, you are doing a CQ with select val * val as newval
and then you can select sum(newval)
from your new data that was calculated from a CQ.
And that works across measurements? Using @bbinet's example, this would work?
select oxygen_percentage.value * flow_rate.value as newmeasurement from oxygen_percentage, flow_rate
Hmm, it should, but I just tried this basic test and it crashed the server :cry:
> create database math
> use math
Using database math
> insert mul a=1,b=2
> select * from mul
name: mul
---------
time a b
2015-09-21T12:17:36.377625368Z 1 2
> select a*b as c from mul
ERR: Get http://localhost:8086/query?db=math&q=select+a%2Ab+as+c+from+mul: EOF
I logged another issue here: https://github.com/influxdb/influxdb/issues/4183
and that was only from one measurement :)
Hopefully this is a central bug in our post-processing that when fixed will fix all of it. I'll see if I can fix it today. It appears to be just a bad reference while putting the math together, so it might be a quick fix.
Thanks @corylanou, but as @srfraser said in his previous comment, your example comes from the same measurement: is it supposed to work with multiple measurements? I thought that queries running as continuous were the same as normal queries so if maths does not work across multiple measurements in a normal query, I thought it won't work neither in a continuous query. Is it wrong?
Ah, yes, I keep forgetting we don't calculate across values. Although in a simple query we should support this. The biggest problem is type checking and overflow so that when you take an unsigned int and multiple it by a float, etc. that we are able to properly convert to a common type for the math, and not overflow either.
Ok, I see. That would be great if cross measurements calculation could be possible at least for series which shares the same type (since no type conversion would be needed)
+1 We REALLY want this for our use-case!
+1, really missing this feature.
+1
+1
+1
:+1:
:+1:
:+1:
+1
+1
+1 thought I was going crazy, but this is a pretty substantial omission that might mean I've got to use another project instead of influx. Many times there is just no way to get correlated information into the same measurement. Even after an arduous journey with CQs, I only found that tags aren't included in CQ writes so there is no way to even fan-in with multiple CQs. Why were the MERGE
and JOIN
features from 0.8 dropped without there being a replacement? With the 0.9 documentation recommending the optimal way to structure things is to have many series and a single field named “value” (or some other key of your choice) used consistently across all series.
and there apparently being no way to migrate from that kind of structure to the sort recommended at https://docs.influxdata.com/influxdb/v0.10/concepts/schema_and_data_layout/ I'm worried we're left hanging.
A viable CQ approach would be OK, but it is a lot more work than simply joining time-grouped measurements at query time, the way that influx used to work.
@graphex not that it solves the main problem, but tags are in fact included in CQ writes if the CQs have something like group by time(30m), *
in them.
+1
:+1:
+1
The proposed syntax above likely won't work since it conflicts with another potential query.
> insert cpu value.host=2
> select * from cpu
name: cpu
---------
time value.host
1460469115659777269 2
This seems to currently be a valid query. @pauldix any ideas what syntax we should use for this kind of feature?
I'm also looking for a way to do math between two measurements.
I've got one measurement for Volts, another for Amps. The data is being provided by two different pieces of equipment. I would like to multiply the Volts value with the Amps value (time correlated) to get a calculated Watts value.
+1
:+1:
+1 Really need this as not having this is great miss.
+1
+1
+1
+1
+1
+1
+1
Can I request that this issue be locked? I'd like to receive notifications about actual updates, and not just random "+1"s.
Done. If you are a person interested in this issue, please add a 👍 reaction to the top of the message instead of a +1 comment and then click "Subscribe".
I am unlocking this to continue meaningful discussion on the issue. Please refrain from adding meaningless +1's. This is a high traffic issue with a lot of subscribers. If you want to express your approval for the feature, please use a reaction at the top of the issue.
Unfortunately, locking an issue also locks GitHub reactions. I did not know that when I locked it.
Hi,
I am currently using Grafana and InfluxDB for monitoring purposes. I have two measurements. Measurement 1 : Domain,Available Capacity, Threshold Measurement 2 : Domain, Peak TPS My use case is plot graph if the Peak TPS exceeds Threshold. Here i am dealing with two measurements. Can you please suggest how can i use data from two measurements to plot the graph when the condition is satisfies.(Peak TPS > Threshold)
I have a web service which calls multiple downstream services on each request (the services called may change based on the request). I have various timing measurements in my application's components: cache put/get, call time for each downstream service, etc.
The ability to perform mathematics across these measurements would enable rich and sophisticated graphs to expose data such as what % of the request is spent in each timing component while allowing the measurements to still be independent of each other (I don't want to include all timing metrics as fields in a "request_timings" measurement because some timings are independent of the web request - for example, Redis/cache timing metrics are not just used per request but by other application components).
More importantly this enables alarming on arbitrary "calculated" or "derived" measurements which is extremely useful for creating precise, unambiguous alarms.
This feature is extremely important in time series market data for stocks and in other financial systems. Consider this simple example, you have market data tick with say price and size. You want to derive notional value across all ticks something like (price * size). This seems infeasible in current setup. Also, joining on timestamps and tags, could be error prone.
The current schema seems to well only for independent measurements like sensors or cpu etc.
Apologies if this is a duplicate, I had a look and couldn't see a relevant issue.
I can see from the documentation how to select from multiple measurements (although it calls them series, still, at https://influxdb.com/docs/v0.9/query_language/data_exploration.html )
For example, with data inserted by telegraf, you can do:
select * from disk_used,disk_total where host = 'myhostname' and path = '/'
How would you express that as a percentage? I've tried variations of the following, and none seem to work:
select disk_used.value/disk_total.value from disk_used, disk_total where host = 'myhostname' and path='/'
The "mydb"."retentionpolicy"."measurement" syntax doesn't work there, either.
Is it a good idea to add aggregation functions for cases like
diff(value1, value2) from m1, m2
anddivide(value, value) from m1, m2
, or should the arithmetic operators be working?Also, I noticed when experimenting that it's also not possible to divide one derivative by another. For example, if I have two counters, bytes transferred and api calls made - both of which are constantly going up - how would you calculate the mean bytes per api call?