influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
28.92k stars 3.55k forks source link

[feature request] Support month and year as duration unit #3991

Closed ghost closed 4 years ago

ghost commented 9 years ago

For example:

Its useful for me when i want to query the data last month and year...

blackw1ng commented 5 years ago

+1

GROUP BY time(1Y)
GROUP BY time(1M)
xsherlockpl commented 5 years ago

It was supposed to be included in the new flux language? Anyone can confirm that flux 0.12 that is included with influx_db 1.7.x has that capability ? I need to compare month by month 2016-2018 solar production but that can't be done easily with current query language.

DerMika commented 5 years ago

I haven't tried it yet, but Flux seems to support months according to these docs

timhallinflux commented 5 years ago

It is not yet supported in any version of Flux. We will update this issue once it does. But, this is still in plan for us to incorporate support for months, years and other human scale time ranges. We are working hard to move this forward.

polcape commented 5 years ago

+1

pantlavanya commented 5 years ago

Hi All, I am new to influx, What if we create 2 columns(tags) with month, year? Then we can do monthly and yearly aggregation?

Codelica commented 5 years ago

@pantlavanya, you could. But it will (endlessly) increase your series cardinality. Depending on your current tags, it could be a drastic or even unworkable option. As a time-series database, it's the job of Influx to handle time related functionality, at least IMO.

kylethebaker commented 5 years ago

@pantlavanya, another limitation to this is that you would need to use tags for your month and year fields in order to do the grouping, and tags are strings only. So if you wanted to a month-to-date for some day in the month (say the 15th) for years passed 2015, you wouldn't be able to do something like select sum("some_field") where day_of_month < 15 and year > 2015 group by month, year because of the string vs int comparison.

timhallinflux commented 5 years ago

https://github.com/influxdata/flux/blob/master/docs/SPEC.md#duration-literals and https://github.com/influxdata/flux/issues/413

RedShift1 commented 5 years ago

I've written some seriously hacky code on NodeJS to have aggregation windows of months and years, you can find it here. Would not recommend you run this in production, it's terrible.

mfernand0 commented 5 years ago

+1 GROUP BY time(1M) GROUP BY time(1Y) GROUP BY time(1q) "quarterly"

order by priority

eelyaj commented 5 years ago

+1 GROUP BY time(1M) GROUP BY time(1Y) GROUP BY time(1q) "quarterly"

cmolitor commented 5 years ago

+1 GROUP BY time(1M) GROUP BY time(1Y)

eraelpeha commented 5 years ago

It was supposed to be included in the new flux language? Anyone can confirm that flux 0.12 that is included with influx_db 1.7.x has that capability ? I need to compare month by month 2016-2018 solar production but that can't be done easily with current query language.

Tried it two days ago, and it worked for me.

xsherlockpl commented 5 years ago

Can you share a flux query that does that?

On April 28, 2019 00:25:34 eraelpeha notifications@github.com wrote:

It was supposed to be included in the new flux language? Anyone can confirm that flux 0.12 that is included with influx_db 1.7.x has that capability ? I need to compare month by month 2016-2018 solar production but that can't be done easily with current query language.Tried it two days ago, and it worked for me. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

eraelpeha commented 5 years ago

Can you share a flux query that does that? On April 28, 2019 00:25:34 eraelpeha @.***> wrote: It was supposed to be included in the new flux language? Anyone can confirm that flux 0.12 that is included with influx_db 1.7.x has that capability ? I need to compare month by month 2016-2018 solar production but that can't be done easily with current query language.Tried it two days ago, and it worked for me. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

Of course. For example, I have a counter value and this gives me the total amount of bytes received by month in a grafana table:

from(bucket: "sensordata/autogen")
|> filter(fn: (r) => r._measurement == "monitoring" and r.device == "fb01" and r.network == "TotalBytesReceived" and r._field == "bytes")
|> range($range)
|> window(every: 1mo)
|> increase()
RedShift1 commented 5 years ago

We can't replace every influxql query with flux, it appears to be more than 10 times slower:

# time influx -database db -execute $'SELECT max("OIL_PRESSURE") FROM "db"."autogen"."main_plc" WHERE time > now()-7d AND "deviceId"=\'xx\' GROUP BY time(1d) FILL(null)' output omitted

real 0m2.097s user 0m1.591s sys 0m0.093s

# time influx -database db -type 'flux' -execute 'from(bucket: "db")
>   |> range(start: -7d)
>   |> filter(fn: (r) => r._measurement == "main_plc" and r._field == "OIL_PRESSURE" and r.deviceId == "xx")
>   |> window(every: 1d)
>   |> max()'

output omitted

real 0m27.127s user 0m1.587s sys 0m0.084s

Update: if the implementation does what I think it does, the influxql query will read data tagged with deviceId xx from disk, whilst the flux query will read all data from disk and filter it in memory. This also means this will get progressively slower as more different deviceId tags are added. Something you need to keep in mind for your dataset.

xsherlockpl commented 5 years ago

Holy shit, that's slow.... how granular is your oil_pressure data? What hardware?  My infuxdb holds now 3y of solar power power generation @ 1Hz  and still grafana will pull the year graph within 5s for 2 solar generators. But at 10x that would put a lot of stress on my VM.

On 29/04/2019 09:59, Glenn Matthys wrote:

We can't replace every influxql query with flux, it appears to be more than 10 times slower:

|# time influx -database db -execute $'SELECT max("OIL_PRESSURE") FROM "db"."autogen"."main_plc" WHERE time > now()-7d AND "deviceId"=\'xx\' GROUP BY time(1d) FILL(null)'| /output omitted/

real 0m2.097s user 0m1.591s sys 0m0.093s

|# time influx -database db -type 'flux' -execute 'from(bucket: "db")

|> range(start: -7d) > |> filter(fn: (r) => r._measurement == "main_plc" and r._field == "OIL_PRESSURE" and r.deviceId == "xx") > |> window(every: 1d) > |> max()' |

/output omitted/

real 0m27.127s user 0m1.587s sys 0m0.084s

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/influxdata/influxdb/issues/3991#issuecomment-487483553, or mute the thread https://github.com/notifications/unsubscribe-auth/AFU2DKB6TD4KWC77HBKVU6TPS2TEXANCNFSM4BO7KISA.

RedShift1 commented 5 years ago

Holy shit, that's slow.... how granular is your oil_pressure data? What hardware?  My infuxdb holds now 3y of solar power power generation @ 1Hz  and still grafana will pull the year graph within 5s for 2 solar generators. But at 10x that would put a lot of stress on my VM.

Hardware: 2 x Intel Xeon E5-2643v2 (6C/12T, 3.5 GHz), 128 GB RAM, data stored on Intel D3-S4510 SSD's in RAID 10 Database size: 84 GB Granularity is 10 Hz (one datapoint every 100 ms).

timhallinflux commented 5 years ago

We haven't been focused on performance at this time. We are focused driving towards feature complete at the moment. Performance work is planned to begin soon.

cmolitor commented 5 years ago

I am trying the new possibility to "group by month" with flux.

I have values of a cumulated energy meter for every day and want to get the monthly energy consumption. Thus I would like to take the value at the end of the month and subtract the value of the beginning of the month.

I managed to setup the following script but face still a problem:

from(bucket: "smartfarm/autogen")
  |> range(start: -1y)
  |> filter(fn: (r) => r._measurement == "E_real_con") 
  |> window(every: 1mo, period: 1mo)
  |> last()
  |> group()
  |> difference()

The resulting data is not grouped by calendaric month but different time intervals which you see in the screenshot (I could not copy the data from chronograf).

Bildschirmfoto 2019-04-30 um 15 07 08

Is it possible to group by calendaric month (e.g. 2019-01-01 to 2019-01-31)?

Thanks!

mfernand0 commented 5 years ago

same here, does the new version allow to group by month? (calendar)

timhallinflux commented 5 years ago

No. This hasn't been implemented yet. It is on our backlog...and the plan is to include this with Flux (as we've stated in this thread). We absolutely understand the value and importance of getting this done.

mfernand0 commented 5 years ago

Thanks @timhallinflux

Is there any workaround to this issue? or do we know when is going to be implemented (aprox... a year, 6 months ) ?

cmolitor commented 5 years ago

Thanks @timhallinflux for the clarification.

Furthermore: Is the intervals feature already implemented? I always get an error message if I try the following:

from(bucket: "smartfarm/autogen")
  |> range(start: -1y)
  |> filter(fn: (r) => r._measurement == "E_real_con") 
  |> window(intervals: intervals(every: 1mo, period: 1mo))
  |> last()
  |> group()
  |> difference()

I get the following error message:

Bildschirmfoto 2019-04-30 um 21 05 10
timhallinflux commented 5 years ago

Two things to look at: https://docs.influxdata.com/flux/v0.24/functions/built-in/misc/intervals/ and https://docs.influxdata.com/flux/v0.24/functions/built-in/transformations/window/

I think you can simplify the window() function as: |> window(every: 1mo, period: 1mo)

RedShift1 commented 5 years ago

I am trying the new possibility to "group by month" with flux.

I have values of a cumulated energy meter for every day and want to get the monthly energy consumption. Thus I would like to take the value at the end of the month and subtract the value of the beginning of the month.

I managed to setup the following script but face still a problem:

from(bucket: "smartfarm/autogen")
  |> range(start: -1y)
  |> filter(fn: (r) => r._measurement == "E_real_con") 
  |> window(every: 1mo, period: 1mo)
  |> last()
  |> group()
  |> difference()

The resulting data is not grouped by calendaric month but different time intervals which you see in the screenshot (I could not copy the data from chronograf).

Bildschirmfoto 2019-04-30 um 15 07 08

Is it possible to group by calendaric month (e.g. 2019-01-01 to 2019-01-31)?

Thanks!

What if you floor the start time down to the beginning of the first month instead of specifying a relative time description?

Pruxis commented 5 years ago

Any updates on this? It's been like four years since the initial request.

No. This hasn't been implemented yet. It is on our backlog...and the plan is to include this with Flux (as we've stated in this thread). We absolutely understand the value and importance of getting this done.

Does this mean that InfluxQL is deprecated as it seems you are no longer adding functionality to it?

RedShift1 commented 5 years ago

@Pruxis don't count on it being fixed in a reasonable amount of time... You'll have to use the influx-fixes I posted earlier or decompose the aggregation by months into days, query influx, and then recompose the data into months.

konstantinblaesi commented 5 years ago

@RedShift1 What about the changes in @Anaisdg's PR ? Will those allow to group by month and aggregate using e.g. sum ?

Anaisdg commented 5 years ago

@konstantinblaesi That's the idea.

gaardiolor commented 5 years ago

We can't replace every influxql query with flux, it appears to be more than 10 times slower:

I'm seeing something similar.. flux is 6 times slower doing a sum of 150k records, unworkable. 6x more hardware to finally get group by time (1mo) and group by time (1y) in a time series database (in which the group by time(1w) week starts on Thursday 😕)..

Tried tagging months myself in the data, I think similar to the workaround of @Anaisdg . But then in Grafana I have to use X-Axis mode Series and stacking no longer works (https://github.com/grafana/grafana/issues/7949), so that doesn't work for me.

mjmcgrath2010 commented 5 years ago

+1 for this feature request.

antipooh commented 5 years ago

+1

dominikbenner commented 5 years ago

@Anaisdg How can i use your Plugin? Is it in 2.0 alpha? Is it possible to update past measurements, so that all data has this tags?

konstantinblaesi commented 5 years ago

@s4ndst0rm https://github.com/influxdata/telegraf/commit/3e5cfad2b051b454c50e248e544791d1aecd6586 says it's in

right?

mabnz commented 5 years ago

+1 GROUP BY time(1M) GROUP BY time(1Y)

krtschmr commented 5 years ago

:sad:

dominikbenner commented 5 years ago

@RedShift1 I tried flux too. When i window by month and aggregate with max function the results are little bit strange:

This is my query:

from(bucket: "openhab/autogen")
  |> range(start: dashboardTime)
  |> filter(fn: (r) => r._measurement == "Heizung_Total" and (r._field == "value"))
  |> aggregateWindow(every: 1mo, fn: max)
  |> group(columns: ["_time", "_start", "_stop", "_value"], mode: "except")

result:

#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,double,string,string |  
-- | --
#group,false,false,false,false,false,false,true,true |   |   |   |   |  
#default,_result,,,,,,, |   |   |   |   |   |   |  
,result,table,_start,_stop,_time,_value,_field,_measurement |   |   |   |  
,,0,2019-08-01T06:19:00Z,2019-10-26T06:36:41.0575593Z,2019-08-08T00:00:00Z,20402.2945,value,Heizung_Total
,,0,2019-08-01T06:19:00Z,2019-10-26T06:36:41.0575593Z,2019-09-05T00:00:00Z,20567.5373,value,Heizung_Total
,,0,2019-08-01T06:19:00Z,2019-10-26T06:36:41.0575593Z,2019-10-03T00:00:00Z,20851.7778,value,Heizung_Total
,,0,2019-08-01T06:19:00Z,2019-10-26T06:36:41.0575593Z,2019-10-26T06:36:41.0575593Z,21150.6156,value,Heizung_Total

These are values of an energy meter so that the max value is usually at last day of month. Why is the time field not at last day of month?

nathanielc commented 5 years ago

@s4ndst0rm The current implementation of Flux months is equivalent to 30d which is why its not behaving as expected. That said we are currently updating the implementation to use real calendar months. See https://github.com/influxdata/flux/pull/2067 to follow our progress.

OFark commented 5 years ago

With regard to the months and years not always being the same what I would expect would be to knock x off the year or month and take the first existing date BEFORE that. so: 2019-03-30 - 1M = 2019-02-28 and similarly: 2020-02-29 - 1y = 2019-02-28

krtschmr commented 5 years ago

@OFark this is ruby. he does really good with dates. everybody wants to be like ruby.

 a = Date.parse("2019/03/30") - 1.month
=> Thu, 28 Feb 2019

Date.parse("2020/02/29") - 1.year
=> Thu, 28 Feb 2019
nikoladsp commented 4 years ago

+1 GROUP BY time(1Y) GROUP BY time(1M) GROUP BY time(1q) "quarterly"

Thanks!

agolovenko commented 4 years ago

5 years ago this was opened...

timhallinflux commented 4 years ago

This is available in Flux now.

dominikbenner commented 4 years ago

@s4ndst0rm The current implementation of Flux months is equivalent to 30d which is why its not behaving as expected. That said we are currently updating the implementation to use real calendar months. See influxdata/flux#2067 to follow our progress.

Hey @nathanielc

I saw that you merged your code to master last year. I've updated to InfluxDB 1.7.10 but the results are the same. Is your change not available in this version?

konstantinblaesi commented 4 years ago

@s4ndst0rm I assume this is only in influxdb 2.x

timhallinflux commented 4 years ago

Yes, it's available in InfluxDB 2.0 OSS Beta and InfluxDB Cloud. This will ALSO be in the 1.8 InfluxDB release. Testing is wrapping up now. Expect an update next week.

tparvais commented 4 years ago

+1, hope it will be delivered soon with 1.8 GROUP BY time(1Y) GROUP BY time(1M)

kwu83tw commented 4 years ago

+1, same thing here to see it be delivered in v1.8