influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
28.65k stars 3.54k forks source link

Mathematics across measurements #3552

Open srfraser opened 9 years ago

srfraser commented 9 years ago

Apologies if this is a duplicate, I had a look and couldn't see a relevant issue.

I can see from the documentation how to select from multiple measurements (although it calls them series, still, at https://influxdb.com/docs/v0.9/query_language/data_exploration.html )

For example, with data inserted by telegraf, you can do: select * from disk_used,disk_total where host = 'myhostname' and path = '/'

How would you express that as a percentage? I've tried variations of the following, and none seem to work:

select disk_used.value/disk_total.value from disk_used, disk_total where host = 'myhostname' and path='/'

The "mydb"."retentionpolicy"."measurement" syntax doesn't work there, either.

Is it a good idea to add aggregation functions for cases like diff(value1, value2) from m1, m2 and divide(value, value) from m1, m2, or should the arithmetic operators be working?

Also, I noticed when experimenting that it's also not possible to divide one derivative by another. For example, if I have two counters, bytes transferred and api calls made - both of which are constantly going up - how would you calculate the mean bytes per api call?

evanrich commented 8 years ago

As others have mentioned, this is a highly requested and important feature. I use influx for recording metrics from my house and home server lab, of of the things I track is whole-house energy consumption. I capture wattage and Voltage coming off the meter, which does fluctuate. Currently in grafana, I'm simply dividing currant by 124 (average voltage rating I get) to get the currant in amps, but since voltage fluctuates between 120 and 125 or so, I'd like my current rating to be more accurate by being able to divide the two series. Graphite has this, can we please get this added?

acruise commented 8 years ago

IANAIE (Influx Employee) but IIUC this feature is especially tricky for Influx compared to most TSDBs because there's no defined time resolution. So if you're trying to join a series at 13-second resolution with one at 19-second resolution, You're Gonna Have a Bad Time (prime numbers chosen to accentuate pain).

Now, if it happens that you know for an ironclad fact that two of your series have exactly the same time granularity and cadence-- in this limited case it's safe and relatively easy to do a join. But Influx doesn't carry this kind of metadata, or constrain the points that you write to any particular resolution.

So, in the general case, in order to make progress you have to invent a whole DSL for deciding stuff like how to dedupe fields and combine values when two points snap to the same rounded timestamp.

acruise commented 8 years ago

I think a good way to start cutting into this would be to support joins between two members of the set { subqueries, measurements derived from a CQ } that have the same "group by" criteria (tags and, critically, time).

It would be nice if simple expressions could be used to adapt the tag values and paper over syntactic mismatches.

It would also be nice if you could collapse any aggregates that appear on both sides, e.g. "(foo.A + foo.B) as APlusB", "avg(foo.A, foo.B)", etc.

jsternberg commented 8 years ago

Mathematics Across Measurements

A series is defined as a single field organized by a collection of tags contained within a measurement. When you write multiple fields using the line protocol, this writes multiple series to the measurement with the same set of tags.

When selecting from a single measurement and selecting from multiple fields, an implicit join is done on the time of each point in the series for the fields that are requested.

Ideal Behavior

At the moment, when multiple measurements are used, they are not joined together and are instead listed one by one. Ideally, measurements would automatically be merged and look like this.

Ideally, measurements would automatically be joined on time for each of these measurements. For example, with this, you might get:

cpu value=1 0
cpu value=2 10
cpu value=3 20
mem value=4 0
mem value=5 10
mem value=6 20

> SELECT * FROM cpu, mem
name: cpu
time    cpu.value   mem.value
----    ---------   ---------
0       1           4
10      2           5
20      3           6

This is how it should look and how it should return. While the name is included, it mostly doesn't mean anything when the query returns like this.

Grouping Behavior

In the existing system, series can be grouped into multiple series instead of just a single series by using GROUP BY <tag>. We will still continue to support this and this will continue to be an important way of organizing series.

So if I have the following:

cpu,host=server01,region=useast value=1 0
cpu,host=server02,region=useast value=2 0
mem,host=server01,region=useast value=3 0
mem,host=server02,region=uswest value=4 0

> SELECT * FROM cpu, mem GROUP BY host
name: cpu
tags: host=server01
time    cpu.value   mem.value
----    ---------   ---------
0       1           3

name: cpu
tags: host=server02
time    cpu.value   mem.value
----    ---------   ---------
0       2           4

The series that have the same host are grouped together. If you want to match a differently named tag, then you may have issues since I don't think we'll support that for the initial iteration of this feature. We would need to provide some way to tell the query that you want to interpret one of the tags as a different tag and it is likely not worth trying to get that syntax correct if people don't need that functionality.

As can be seen from this example, the measurements from 'server02' do not match the same server. One is from region 'uswest' and one is from 'useast'. Because we are not grouping by region, these can be matched together. If you wanted the query to match the region too, you would do this and get the following output:

> SELECT * FROM cpu, mem GROUP BY host, region
name: cpu
tags: host=server01,region=useast
time    cpu.value   mem.value
----    ---------   ---------
0       1           3

name: cpu
tags: host=server02,region=useast
time    cpu.value   mem.value
----    ---------   ---------
0       2

name: cpu
tags: host=server02,region=uswest
time    cpu.value   mem.value
----    ---------   ---------
0                   4

Notice how the measurement name is still cpu. When using a join, the measurement name is frankly irrelevant and should be discarded, but we aren't going to because a measurement name is required and is used in some locations.

Be aware that if you GROUP BY * it will include all possible tags. That means if you have one measurement with 3 tags and one with 2 tags, they may never match because the measurements with 2 tags will have an empty value for the third tag and will never match.

Old Behavior

Instead of these being joined, they will be returned as separate series and the measurement name is considered important. Using this paradigm, it's not really possible to join the measurements in a way that makes sense.

cpu value=1 0
cpu value=2 10
cpu value=3 20
mem value=4 0
mem value=5 10
mem value=6 20

> SELECT * FROM cpu, mem
name: cpu
time    value
----    -----
0       1
10      2
20      3

name: mem
time    value
----    -----
0       4
10      5
20      6

Proposed Syntax

The new syntax will force the ideal behavior to occur, but we cannot break backwards compatibility. If this proves to be a good output format, we may consider using this format in the future.

For the initial implementation, we need to implement OUTER and INNER joins, although it's unclear if we need to use those words. Kapacitor implements them implicitly by using INNER if fill(none) is specified and using OUTER for any other fill option. Since we default to fill(null), I think we should default to an OUTER join and do the same thing as Kapacitor.

SELECT * FROM cpu, mem JOIN MEASUREMENTS

JOIN would be a new token, but MEASUREMENTS is a currently existing token. Rather than try to do something like this:

SELECT * FROM cpu JOIN mem

I am proposing we use something more friendly to the old syntax. This way, it is more natural to include support for regexes and multiple measurements. For three measurements, I think this is awkward:

SELECT * FROM cpu JOIN mem JOIN gpu

With a regex, it also doesn't match since the implication is that each entry refers to a single measurement.

# Does this join cpu and gpu or not?
SELECT * FROM /[cg]pu]/ JOIN mem

The proposed syntax takes away that ambiguity by telling the query to join all of the measurements within the measurements list.

Consequences to the Existing System

This will fix math for even within the same measurement. Right now, there are no particular rules for how these measurements are joined together to perform intra-measurement mathematics. That means if you try to do math on multiple fields within the same measurement, they can get out of sync very easily and there is no way to control how these behave.

This will also implement the feature requested in #7216 since intra-measurement joins will begin to start using the JOIN functionality for their behavior. This is an additive change and is not a breaking change because the default fill option is null which is what it currently does. This will just allow a person to customize this.

The existing return value for multiple measurements will not be changed since that would be a breaking change. Users will have to use the new syntax for telling InfluxDB to perform the join for measurements rather than just within a measurement.

Additional Considerations

Tolerance Margins

Kapacitor implements tolerance margins so that if two points are close enough in timestamps, they will be joined in the case that their timestamps are within the tolerance rather than requiring the timestamps to be exact. Additional points will just follow the same rules that were set by fill.

Field name conflicts

When a field name conflicts between multiple measurements, which should be used or should both be used? With standard SQL, the query will return an error saying that it is unable to resolve the ambiguity between the two. For conditions, this may be necessary since we don't want to guess which value they intended in a situation like this:

SELECT * FROM cpu, mem WHERE value > 2

We could transform this to one of the following though:

SELECT * FROM cpu, mem WHERE cpu.value > 2 AND mem.value > 2
SELECT * FROM cpu, mem WHERE cpu.value > 2 OR mem.value > 2

I'm not sure either of these would be proper though and there wouldn't really be a way for us to determine which of these the user intends. But when we use a regex, we expand it similar to how we expand wildcards. We also expand wildcards. If we have a conflict, it would be possible to transform the following:

SELECT value FROM cpu, mem

Into this statement:

SELECT cpu.value, mem.value FROM cpu, mem

This would be the same as if we had done this with the original.

SELECT /^value$/ FROM cpu, mem

But is this a behavior that the user would expect? That's less clear. And since this behavior would be different from SQL, it's not clear that it's something a user would expect.

I would be tempted to make this an error for the initial implementation and then we can expand it later, but some feedback on what others think would be very helpful here.

Unused measurements

Testing with SQL, SQL does not seem to care if you use an implicit JOIN and don't use one of the values inside of the JOIN. Because of that, we're just not going to care either.

Example:

sqlite> .headers on
sqlite> CREATE TABLE a(id integer, host varchar(256));
sqlite> CREATE TABLE b(id integer, region varchar(256));
sqlite> INSERT INTO a VALUES(1, 'server01');
sqlite> INSERT INTO b VALUES(2, 'useast');
sqlite> SELECT a.id FROM a, b;
id
1
sqlite> SELECT * FROM a, b;
id|host|id|host
1|server01|2|useast

This seems like an easy enough pattern and, since it has precedent, seems like something we should do.

Please leave any feedback you have about the proposed syntax and behavior. If you like the proposal and think it works for your use case, please leave a reaction. No :+1: comments since that will make discussion harder.

jsternberg commented 8 years ago

I updated the above comment to include information about how grouping works. This is important because it determines how to get similar series to match with each other which would be very important for this feature.

beckettsean commented 8 years ago

Notice how the measurement name is still cpu. When using a join, the measurement name is frankly irrelevant and should be discarded, but we aren't going to because a measurement name is required and is used in some locations.

Perhaps we could create a new measurement name by concatenating the measurements from the SELECT clause? E.g.

> SELECT * FROM cpu, mem GROUP BY host
name: cpu|mem
tags: host=server01
time    cpu.value   mem.value
----    ---------   ---------
0       1           3

I'd advocate for using a rare character as the join, rather than - or _. That will make it somewhat less ambiguous

The proposed syntax takes away that ambiguity by telling the query to join all of the measurements within the measurements list.

Works for me. Also removes any worry about OUTER vs INNER since the user never specifies those.

We could transform this to one of the following though:

SELECT * FROM cpu, mem WHERE cpu.value > 2 AND mem.value > 2
SELECT * FROM cpu, mem WHERE cpu.value > 2 OR mem.value > 2

In standard SQL users are quite used to using table.column to return values in queries with more than one table. I think for InfluxQL, a similar syntax works well. If a field or tag that is NOT part of the join is used, then the user must specify the parent measurement.

Tolerance Margins

I do think we should include these, but I would be fine with waiting for a future release. The initial implementation can join only on explicit timestamp matches. Let's debug the rest of the syntax and behavior before letting users create 1h tolerance joins on data.

Question: What does it mean to do something like:

SELECT * FROM cpu, cpu GROUP BY host
jsternberg commented 8 years ago

I personally don't like the idea of concatenating the measurement names. I'm not convinced measurement names should even exist anymore. I noticed Grafana will concatenate the measurement name and the column name and I think we could probably get that working so they ignored the measurement name if it was empty and just used the column. When we start returning the measurement name in the column output, I think that's probably preferable. Although there is a bit of a question. If I do this:

SELECT mean(cpu.value) AS mean_value FROM cpu

Should the column be cpu.mean_value or mean_value?

I think for something like:

SELECT * FROM cpu, cpu GROUP BY host

That would be rewritten to:

SELECT cpu.value, cpu.value FROM cpu GROUP BY host

Although we still don't have the best idea of what happens when you combine the same measurement from multiple databases/retention policies. It seems to me like the measurement should be tied to the field rather than being placed into the FROM clause. I'm not entirely sure how to do that syntactically, but having the FROM come at the end is syntactically ambiguous in these cases. I've been wondering if it should be rewritten like this:

SELECT db0.autogen.cpu.value FROM db0.autogen.cpu
pauldix commented 8 years ago

I think it might be cleaner and a bit more flexible to use a slightly different syntax. Why not make join a function call? So you can do:

SELECT value
FROM join(0, mem, cpu), foo

-- or
SELECT value
FROM join(/foo.*/)

Then you could have multiple measurements returned. You'd also be able to specify the tolerance as the first argument. The function join could be an outer join and then later if we want to add inner we'd just add that as function inner_join.

I agree that if the user has something in the WHERE clause that matches against a field, it should return an error like SQL does if it's ambiguous.

jsternberg commented 8 years ago

I think the issue with doing it like that is you would then be mixing two different types of output for two queries that have to work completely differently. What would the output of this example return?

SELECT value FROM join(0, mem, cpu), foo

We have the tolerance in there, which should be an optional parameter since many people probably won't care too much about it, and then we have two measurements. Makes sense, but then we have another measurement attached to the end. We also have only one variable and it is ambiguous who it refers to for the join, but makes complete sense for foo.

So the mechanics of join isn't really that of a function. It's not a simple transformation of data within the query language, but it acts more like changing the functionality completely.

As an example syntax for tolerance, I was thinking something like this personally:

SELECT cpu.value, mem.value FROM cpu, mem JOIN MEASUREMENTS WITH tolerance(1s)

Most of these tokens are already reserved keywords. We could decide if tolerance() should be a function or not, but I think it reads decently well. I think it also sets us up to make JOIN MEASUREMENTS the default whenever we decide to make a breaking change.

pauldix commented 8 years ago

The measurement on the end would be another result set just like it is now if you select FROM foo, bar. The variable for tolerance is between all series being joined. If you wanted to have different tolerances, then we'd need to support nesting like: join(0, foo, join(2, mem, cpu))

No breaking changes until 2.0 (which we're not even thinking about yet).

I guess it's not that big of a deal to be able to do join(0, mem, cpu), foo since you can get the same thing from doing a multi-statement query:

select value from mem, cpu JOIN MEASUREMENTS;
select value from foo;

I'm not that married to it since your last suggestion accomplishes much the same thing and it's already too late for me to shoehorn my functional dream into the query language. Thoughts from others?

jsternberg commented 7 years ago

Problem Statement

While attempting to implement this, I ran into another issue which I'll describe here and I'll describe the solution that we determined would be most appropriate. For those following this issue closely, my apologies for not writing up the problem before we started talking internally so anybody who wanted could contribute. I'm writing it up here for the sake of transparency before we attempt to implement it.

The first thing to understand is how series are joined together currently. When you have a single measurement with multiple fields like the following, we need to join the values together in some way.

cpu,host=server01 idle=20,user=70,system=10 0
cpu,host=server02 idle=30,user=50,system=20 0

When we execute a query to retrieve all of the fields, we are querying 6 different series. There are 3 fields and 2 series keys for a total of 6 series, but we are only returning 2 rows at time zero. How do we determine which series are joined together? According to the other proposal, the series would just be joined together in a fairly random order so you could intermix values from cpu,host=server01 with values from cpu,host=server02. There are a few problems with this though.

  1. It's confusing for users. They expect values that have the same series key to be matched with each other even if that's not how the storage engine works. The current system matches them up so why wouldn't it continue to work?
  2. You would be no longer able to query tags because you would be unable to determine which series to take tag information from.

But, if we don't use this method of joining together series, joining series between measurements becomes very complicated. Imagine that you have the following in the mem measurement and join it with the above.

mem,host=server01,region=uswest free=80 0

Since this series does not have a series key that matches anything above, the value would not be joined with any row in cpu above. While we could force series keys to be the same across measurements for this feature to work, we think that's too limiting.

Updated Proposal

We are proposing a new syntax and new functionality to join. When series are being joined (either between multiple measurements or a single measurement), the user will be able to specify which parts of the series key should be considered while joining. Only tags that are mentioned in the join can be used in the resulting query (including GROUP BY). The join will happen before anything else so series will be joined into a single row before they are grouped into different series. I think this contrasts with Kapacitor which does the grouping first and then joins the series. /cc @nathanielc

The syntax will be something like this:

SELECT cpu.idle, cpu.user, mem.free FROM cpu, mem JOIN SERIES ON host

JOIN MEASUREMENTS has been changed to JOIN SERIES to put the focus on how we are joining series together. Series from different measurements are not treated in a special way.

Series will be joined together with the cartesian product for any values at the same time. So if you tell a query to join on host and have 2 series that match with 2 series in the other field, you will end up with 4 rows total. In particular, that means queries like this will work in a way we think is most fitting.

cpu,host=server01,num=0 value=20 0
cpu,host=server01,num=1 value=30 0
mem,host=server01 value=10 0

> SELECT cpu.value, mem.value FROM cpu, mem JOIN SERIES ON host
time    cpu.value   mem.value
----    ---------   ---------
0       20          10
0       30          10

Feedback

For now, we're going to keep things explicit and try to make this more user friendly with feedback in the future. If you can think of any problems or you think that this is the wrong way to deal with joining measurements, please post a comment so we can address any concerns. This is a difficult problem and we appreciate and welcome any community involvement.

beckettsean commented 7 years ago

When there is a tag selected in the query, is it reasonable to infer that series should be joined on that tag?

As long as "Only tags that are mentioned in the join can be used in the resulting query" is true then I think this is reasonable and in fact necessary.

When a GROUP BY is used, is it reasonable to infer that it should be included in the JOIN so the user doesn't have to repeat it?

I think this is reasonable. I don't think a query would make sense if the GROUP BY tag weren't also part of the join.

Given the cross-product output, what would I get from the following query?

cpu,host=server01,num=0 value=20 0
cpu,host=server01,num=1 value=30 0
mem,host=server01 value=10 0

> SELECT COUNT(cpu.value), COUNT(mem.value) FROM cpu, mem JOIN SERIES ON host
???
jsternberg commented 7 years ago

I was originally thinking the answer would be 2 for both of those columns, but we had another discussion today and @pauldix is doing a writeup of how he thinks the mechanics for this should work. We've been throwing around the idea of making this only work for aggregates rather than allowing raw query math with join, but we're unsure if that would solve the main use case people have.

jsternberg commented 7 years ago

For all of those following this issue, we are having difficulty at the moment with scoping the issue and determining what exact use cases people need. We understand one use case to be doing something like correlating the number of requests with the number of errors using aggregates like this:

SELECT count(errors.value) / count(requests.value)

But I would also like to hear other use cases. Specifically, the above one is more about performing aggregates on two different measurements rather than doing math between those measurements. I would like to hear more use cases involving math between measurements and why you are trying to do it that way so we can determine if we are working on the correct feature and designing it correctly.

Thanks.

kipe commented 7 years ago

We're using InfluxDB to store measurements gathered from the LUT Green Campus. We mainly gather measurements related to energy, which typically means turning voltage and current measurements to power. The electrical system has three phases (L1, L2, L3), which are used as fields in the schema. An example query to calculate active_power from measured voltage, current and phi for a single phase L1 would be something similar to

SELECT mean(voltage.L1) * mean(current.L1) * mean(cos(phi.L1)) GROUP BY time(5m), panel_group

Furthermore, the values for apparent_power and reactive_power can be derived from these three measurements.

rdworth commented 7 years ago

As another data point on use cases we have exactly the same use case as @kipe at Flywheel Building Intelligence.

dandv commented 7 years ago

Any chance for those who just "+1"ed to remove their comments and +1 the OP instead?

It would make it easier to follow the discussion. Thanks.

thealphaking01 commented 7 years ago

Hey Guys,

Another use case which I got was calculating a percentage, where the denominator value is in one measurement and the numerator is in another, and I needed to do something like

SELECT (sum(a.value) / sum(b.value)) * 100

where a and b are separate measurements, which is similar to what you guys mentioned. Is there no roundabout way to do this for now?

jsternberg commented 7 years ago

@thealphaking01 for your example, is that supposed to be sum(a.value) / sum(b.value) or is the second value supposed to be the raw values?

thealphaking01 commented 7 years ago

@jsternberg a.value is a subset of b.value. Didn't get what you meant by raw values? For a particular duration, sum(a.value) gives me the number satisfying a criteria, while sum(b.value) gives me all possible values. Hence the percentage calculation.

jsternberg commented 7 years ago

Raw values are the ones directly written into the system. So SELECT value FROM cpu returns the raw values for the value field in the cpu measurement. The reason why I ask is because I didn't knonw if your above example had a typo or not since a.value is inside of an aggregate and b.value is not.

thealphaking01 commented 7 years ago

Sorry, that was a typo. Fixed it now.

claybar commented 7 years ago

I'm not sure if this is the correct case to implement the issue of case #3525 for inequalities? I would like to do something along the lines of:

SELECT mean(value) > 40

where the result is returned as a full time series of boolean values. My specific use case for this is determining if a pump is operating based on the current draw it is using.

inselbuch commented 7 years ago

Opening a feature request kicks off a discussion. Requests may be closed if we're not actively planning to work on them.

Proposal: Implement SQL Joins in the Influx Query Language Current behavior: Not supported. Desired behavior: Supported.

Use case:

When requesting data from InfluxDB it would be very useful to combine metadata, configuration data, etc., from one "table" with the time-series data. For example:

assets (measurement) time (actually not used in this table) id (tag) = cryptic internal identifier friendly (field) = character string port (field) = number (TCP/IP port number used by the device)

i.e., insert assets,id=y268938rjnau3 friendly='Asset1',port=64200

rundata (measurement) time id (tag) = cryptic internal identifier temperature (field) pressure (field)

This works great: select last(temperature) from rundata group by mac

But this is what I really want:

Timestamp Machine Temperature

10/05/2016 8:04:06pm C2475 1675.4 10/02/2016 9:02:11am C7524 850.5

That might be done like this:

select c.time as "Timestamp", a.friendly as "Machine",last(c.temperature) as "Temperature" from rundata c,assets a where c.id = a.id group by c.mac

daviesalex commented 7 years ago

I dont know if this should be a separate issue, but we would really like to be able to do simple stuff like this:

SELECT non_negative_derivative(last(PortXmitData * 4), $interval) from ibstats WHERE "interface" =~ /ib.*/ AND "host" =~ /$hostname/ and $timeFilter group by time($granularity),interface,host fill(none)

Note the PortXmitData * 4; this blows up today. The reason for this specific reason is that the metrics from IB equipment are returned as "octets divided by 4” (¼ of the actual number of Bytes)" which Grafana does not know how to deal with (because its a stupid unit). However, there are other cases where doing this sort of simple arithmetic is very useful, and hopefully this is a fairly simple feature to implement.

One other feature that is commonly requested is the ability to plot a metric against the mean/95%ile for each item in a GROUP BY. Any method to achieve that is fine by us.

inselbuch commented 7 years ago

I understand your issue… makes sense… the same issue in my opinion. But I don’t see tx_bytes*4 in your query…

I dont know if this should be a separate issue, but we would really like to be able to do simple stuff like this:

SELECT non_negativederivative(last(PortXmitData), $interval) from ibstats WHERE ("interface" =~ /ib./ OR "interface" =~ /p._/) AND "host" =~ /$hostname/ and $timeFilter group by time($granularity),interface,host fill(none)

Note the tx_bytes * 4; this blows up today. The reason for this specific reason is that the metrics from IB equipment are returned as "octets divided by 4” (¼ of the actual number of Bytes)" which Grafana does not know how to deal with (because its a stupid unit). However, there are other cases where doing this sort of simple arithmetic is very useful, and hopefully this is a fairly simple feature to implement.

One other feature that is commonly requested is the ability to plot a metric against the mean/95%ile for each item in a GROUP BY. Any method to achieve that is fine by us.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/influxdata/influxdb/issues/3552#issuecomment-257987465, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AVlReR3OTMcZ_DcCy4OJy3ea4qU58p5Pks5q6PFqgaJpZM4FljNk.

daviesalex commented 7 years ago

@inselbuch my apologies, I fixed the example query.

The problem is that this works: non_negative_derivative(last(PortXmitData), $interval) And this does not: non_negative_derivative(last(PortXmitData * 4), $interval)

espiegel commented 7 years ago

+1

Edit: +1ed the OP

cattt84 commented 7 years ago

+1

joshzitting commented 7 years ago

I am currently in the process of trying to switch from graphite to influx with grafana as our front end.. I currently have queries like this for graphite but I havent found a way to convert them to influx yet..

asPercent(reporting1_dev.memory.used,reporting1_dev.memory.total)
asPercent(reporting1_dev.memory.buffers,reporting1_dev.memory.total)

Also like this

asPercent(nonNegativeDerivative(reporting1_dev.cpu.total.iowait),nonNegativeDerivative(sumSeries(reporting1_dev.cpu.total.*)))
asPercent(sumSeries(nonNegativeDerivative(reporting1_dev.cpu.total.{guest,iowait,nice,steal,irq,softirq})),nonNegativeDerivative(sumSeries(reporting1_dev.cpu.total.*)))

Any help would be great!!! Thanks! Josh

So I found that I if I am doing math all of the info has to be in the same table.

ikkaro commented 7 years ago

+1 we need to do some maths between measurements

lpic10 commented 7 years ago

This is a huge feature missing from InfluxDB, making it much less powerful than Prometheus.

actionjax commented 7 years ago

+1

trondvh commented 7 years ago

+1

andrewpuch commented 7 years ago

+1

joshzitting commented 7 years ago

I think the point has been stated with all of the +1s... This thread should be locked until there is progress on it.

tblok commented 7 years ago

+1

jsternberg commented 7 years ago

I'm locking this again. While that means you won't be able to add your 👍 reaction, I think there are enough of them that we're well aware people want this feature. We do want to hear any resources that may be useful in terms of implementing this. You can look in the commit log to find my email. If you want to wait for comments regarding this issue, please use the "Subscribe" button on the issue instead of responding to the issue.

Thank you.