selecting mutilple columns and filter on the field values with flux

sivasai-quartic commented 4 years ago

I'm trying to do a filter on the field values. It's like a normal SQL query like below.

select col1,col2,col3 from measurement where col3 > 10

here col1,col2,col3 are influxdb fields.

I can try the same thing using flux query. But internally using pivot operation which is causing the query performance impact like below.

from(bucket: "test")|> range(start:0)|> filter(fn:(r) => r._measurement == "measurement001"  and (r._field=="col1" or r._field=="col2" or r._field=="col3" or r._field=="col4")) |> v1.fieldsAsCols() |> filter(fn: (r) => r.col4 == 10)  |> limit(n:5)

Flux with pivot processes 13.3 requests per second and FluxQL query processes 62.5 requests per second. FluxQL performs better for this kind query.

As discussed in the chat (https://influxcommunity.slack.com/archives/CH8RV8PK5/p1596574798154000) logging ticket for optimizing

Thanks

MWallenberg commented 1 year ago

This feature would be very appreciated. I am trying to make a query which in normal SQL would be expressed as:

SELECT col1 FROM telemetry WHERE col2 = "foo";

I am able to use colAsFields as @sivasai-quartic described, but when I try to create a check from the query it fails. I have tried to use map to map back into a record with _time, _ field, _value etc, but so far I can't get it to work.

sanderson commented 1 year ago

@MWallenberg If you're using InfluxDB 2.4+ or InfluxDB Cloud, you can use experimental.unpivot() to unpivot your data.

import "experimental"

from(bucket: "example-bucket")
    |> range(start: -30d)
    |> filter(fn: (r) => r._measurement == "telemetry")
    |> filter(fn: (r) => r._field == "col1" or r._field == "col2")
    |> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value")
    |> filter(fn: (r) => r.col2 == "foo")
    |> experimental.unpivot()

How exactly is it failing when you try to create a check?

MWallenberg commented 1 year ago

I am using the C#-client to create checks as descibed here. The query I used was something like this:

import "influxdata/influxdb/schema"
from(bucket: "my-bucket") 
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)  
  |> filter(fn: (r) => r["_measurement"] == "telemetry") 
  |> filter(fn: (r) => r.myTag == "some-tag-value") 
  |> schema.fieldsAsCols() 
  |> filter(fn: (r) => r.field2 == "some-field-value")
  |> last("field1")
  |> yield()

The database responded with an HTTP 400 response and the message: Could not create task from check: expected a single field but got: [].

I think I have worked my way around it, by mapping the pivoted table back into a format that Influx expects for a check. I'm not sure it functions yet, and it feels like a terrible hack, but at least the check can be created with this query:

import "influxdata/influxdb/schema"
from(bucket: "my-bucket") 
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)  
  |> filter(fn: (r) => r["_measurement"] == "telemetry") 
  |> filter(fn: (r) => r.myTag == "some-tag-value") 
  |> schema.fieldsAsCols() 
  |> filter(fn: (r) => r.field2 == "some-field-value")
  |> last("field1")
  |> map(fn: (r) => ({ _time: r._time, _value: r.field1, _field: "field1", _measurement: "telemetry"}))

I am on 2.4, so I will try unpivot as well; it seems less fragile than my solution. But overall this seems like a standard use case that should be easy? The field I want to filter by (field2) is a unique sensor id, and the amount of sensors will grow unbounded over time (currently in the low hundreds, but will grow to several thousand shortly). Therefore it should be a field, not a tag (to avoid runaway series cardinality), but filtering and monitoring should still be feasible.

MWallenberg commented 1 year ago

Update - turns out it didn't work, at all. I accidentally filtered out all the data before the call to map ... so the check could be created, but had no data to work with. Fixing the filtering resulted in the same error message (Could not create task from check: expected a single field but got: []). Will try unpivot to see if I have more luck.

Update 2 - using the unpivot-function, it does work :)

github-actions[bot] commented 9 months ago

This issue has had no recent activity and will be closed soon.

influxdata / flux

selecting mutilple columns and filter on the field values with flux #3078