influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
28.89k stars 3.55k forks source link

difference() 25 times slower after update to 2.7.8 #25226

Closed ssunny1081 closed 2 weeks ago

ssunny1081 commented 2 months ago

I'm using InfluxDBv2 to gather my SolarPower Statistics. With the update from 2.7.6-1 to 2.7.8-1 I suddenly got timeouts on my Grafana Dashboards. As I have increasing counters, I use difference.

Turns out after the update it takes now about 8 seconds what took 300ms before.

Downgrade to 2.7.6-1 brings back the original performance, of around 300ms.

Steps to reproduce:

  1. Query 500.000 double Values
  2. get the difference for each

Query Used: `from(bucket: "nrdb") |> range(start: 2023-08-17) |> filter(fn: (r) => r["_measurement"] == "Solarpower" or r["_measurement"] == "Powermeter") |> filter(fn: (r) => r["_field"] == "SHTC" or r["_field"] == "SVTC" or r["_field"] == "SMTC" or r["_field"] == "TD" or r["_field"] == "TC")

|> difference(nonNegative: true)`

Example Data: ` table_result _measurementgroupstring _fieldgroupstring _valueno groupdouble _startgroupdateTime:RFC3339 _stopgroupdateTime:RFC3339 _timeno groupdateTime:RFC3339
0 Powermeter TC 5520.4166 2023-08-17T00:00:00.000Z 2024-08-07T20:20:40.714Z 2023-08-18T07:44:17.522Z
0 Powermeter TC 5520.4453 2023-08-17T00:00:00.000Z 2024-08-07T20:20:40.714Z 2023-08-18T07:49:17.522Z
0 Powermeter TC 5520.4696 2023-08-17T00:00:00.000Z 2024-08-07T20:20:40.714Z 2023-08-18T07:54:17.523Z
0 Powermeter TC 5520.4852 2023-08-17T00:00:00.000Z 2024-08-07T20:20:40.714Z 2023-08-18T07:59:17.522Z
0 Powermeter TC 5520.4947 2023-08-17T00:00:00.000Z 2024-08-07T20:20:40.714Z 2023-08-18T08:04:17.522Z
0 Powermeter TC 5520.505 2023-08-17T00:00:00.000Z 2024-08-07T20:20:40.714Z 2023-08-18T08:09:17.523Z
0 Powermeter TC 5520.519 2023-08-17T00:00:00.000Z 2024-08-07T20:20:40.714Z 2023-08-18T08:14:17.522Z
0 Powermeter TC 5520.5305 2023-08-17T00:00:00.000Z 2024-08-07T20:20:40.714Z 2023-08-18T08:19:17.523Z

`

Expected behaviour:

difference() should finish in <= 300ms for a set of 500k Datapoints

Actual behaviour:

difference() takes 8s to finish a set of 500k Datapoints

Environment info:

Debian 12 11th Gen Intel(R) Core(TM) i3-1115G4 12GB RAM, less than 5GB in use NVMe SSD

Slow: influxdb2:amd64 2.7.8-1 Working as expected: influxdb2:amd64 2.7.6-1

Config: All default

Logs:

gwossum commented 2 months ago

Minimal repro using inch:

  1. Create new bucket fluxtime (with command line: influx bucket create --name fluxtime --org-id ${ORGID} -t ${TOKEN} )
  2. Create data: inch -v2 -db fluxtime -t 1 -p 1000000 -token ${TOKEN} -start-time 2024-08-12T00:00:00Z -time 24h
  3. Run query: time influx query --org-id ${ORGID} -t ${TOKEN} -r 'from(bucket: "fluxtime") |> range(start: 2024-08-12) |> difference(nonNegative: true)' > /dev/null
ssunny1081 commented 2 weeks ago

Back to normal, fixed in current Version