influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.63k stars 5.58k forks source link

xpath_json format can use the default type defined in the JSON (int, float, string) #14597

Closed shengbinxu closed 9 months ago

shengbinxu commented 9 months ago

Use Case

 data_format = "xpath_json"
  [[inputs.kafka_consumer.xpath]]

xpath_json format, By default (type not explicitly specified), the type of field value is a string

Expected behavior

json_v2 format: For each field you have the option to define the types. The following rules are in place for this configuration:

xpath_json can support this feature ?

Actual behavior

xpath_json format, By default (type not explicitly specified), the type of field value is a string

Additional info

No response

srebhan commented 9 months ago

@MuChenMuXuan I'm not sure I do understand what you are trying to achieve. Lets's assume you do have a JSON with

{
  "value": 5
}

Now if you set

[[inputs.kafka_consumer]]
  ...
 data_format = "xpath_json"
 xpath_native_types = true

 [[inputs.kafka_consumer.xpath]]
  [inputs.kafka_consumer.xpath.fields]
    value = "/value"  

you will get value as a float64 field. That is because the native JSON type for the field is a number (aka floating point value).

When using

[[inputs.kafka_consumer]]
  ...
 data_format = "xpath_json"
 xpath_native_types = true

 [[inputs.kafka_consumer.xpath]]
  [inputs.kafka_consumer.xpath.fields_int]
    value = "/value"  

you will get the value as int64 as fields_int will try to convert the value to integer.

You can also do explicit typing by omitting the xpath_native_types = true setting... With

[[inputs.kafka_consumer]]
  ...
 data_format = "xpath_json"

 [[inputs.kafka_consumer.xpath]]
  [inputs.kafka_consumer.xpath.fields]
    value = "/value"  

you will get value as a string because the JSON document is converted to XML and the element is a string by default. To get float64 in this setting you do

[[inputs.kafka_consumer]]
  ...
 data_format = "xpath_json"

 [[inputs.kafka_consumer.xpath]]
  [inputs.kafka_consumer.xpath.fields]
    value = "number(/value)"  

and to get an integer you should use fields_int as above

[[inputs.kafka_consumer]]
  ...
 data_format = "xpath_json"

 [[inputs.kafka_consumer.xpath]]
  [inputs.kafka_consumer.xpath.fields_int]
    value = "/value"  
shengbinxu commented 9 months ago

Thank you very much for your reply! I didn't realize the existence of this configuration at first, now when I add xpath_native_types = true, it met expectations.

the input json is:

{
    "customerId": 1652,
    "deviceId": "13011304383",
    "timestamp": 1705637828000,
    "parameters": {
        "acc": 0,
        "locationStatus": 1,
        "altitude": 38.0,
        "loc": {
            "lng": 117.306441,
            "lat": 31.93148
        },
        "latitude": 31.93148,
        "brushState": 0,
        "speed": 0.0,
        "direction": 136.0,
        "height": 38.0,
        "longitude": 117.306441,
        "mileage": 267119.0
    },
    "componentId": 7,
    "entityId": 81495
}

and the config:

 data_format = "xpath_json"
 xpath_native_types = true
[[inputs.kafka_consumer.xpath]]
      metric_name = "string('device_metric')"
      timestamp = '/timestamp'
      timestamp_format = 'unix_ms'
      timezone = 'Asia/Shanghai'
      field_selection = "/parameters/child::*"
      ### https://github.com/influxdata/telegraf/tree/master/plugins/parsers/json_v2
      [inputs.kafka_consumer.xpath.tags]
        customerId = "/customerId"
        deviceId = "/deviceId"

      [inputs.kafka_consumer.xpath.fields]
        loc = "string(/parameters/loc)" 

and the output is:

device_metric,customerId=1652,deviceId=13602300115,host=192.168.1.6 latitude=39.59742,loc="{\"lat\":39.59742,\"lng\":109.755555}",locationStatus=1,brushState=0,direction=220,height=1311,longitude=109.755555,mileage=2708,speed=200,acc=1i,altitude=1311 1705673638000000000