influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.63k stars 5.58k forks source link

Parsing nested arrays containing multiple datasets in json_v2 #11467

Open sstickel opened 2 years ago

sstickel commented 2 years ago

Hello everybody.

I'm trying to parse mains grid frequency data from gridradar.net and got stuck. Not sure if it's me or a bug. Every request returns mains frequency data since the last request, so the number of elements is not fixed.

If the data would look like this, everything would be easy:

[
    {
        "datapoints": [
            {
                "f":50.003,
                "t":"2022-07-05 12:55:56"
            },
            {
                "f":50.014,
                "t":"2022-07-05 12:55:57"
            },
            {
                "f":49.998,
                "t":"2022-07-05 12:55:58"
            },
            {
                "f":49.893,
                "t":"2022-07-05 12:55:59"
            }
        ],
        "target": "median_frequency"
    }
]

A corresponding telegraf.conf like this works well here:

[[inputs.file]]
    files = ["./test3.json"]
    data_format = "json_v2"
    [[inputs.file.json_v2]]
        measurement_name = "median_freq"
        [[inputs.file.json_v2.object]]
            path = "0.datapoints"
            disable_prepend_keys = true
            timestamp_format = "2006-01-02 15:04:05"
            timestamp_key = "t"

Telegraf test output:

> median_freq, f=50.003 1657025756000000000
> median_freq, f=50.014 1657025757000000000
> median_freq, f=49.998 1657025758000000000
> median_freq, f=49.893 1657025759000000000

Now, the problem is that in reality the incoming data looks like this:

[
    {
        "datapoints": [
            [
                50.003,
                "2022-07-05 12:55:56"
            ],
            [
                50.014,
                "2022-07-05 12:55:57"
            ],
            [
                49.998,
                "2022-07-05 12:55:58"
            ],
            [
                49.893,
                "2022-07-05 12:55:59"
            ]
        ],
        "target": "median_frequency"
    }
]

Doing some gjson magic in GJSON Playground brought me to this conf:

[[inputs.file]]
    files = ["./test2.json"]
    data_format = "json_v2"
    [[inputs.file.json_v2]]
        measurement_name = "median_freq"
        [[inputs.file.json_v2.object]]
            path = "{datapoints:{f:0.datapoints.#.0,t:0.datapoints.#.1}.@group}"
            disable_prepend_keys = true
            timestamp_format = "2006-01-02 15:04:05"
            timestamp_key = "datapoints.t"

Well, the test output is just empty.

Is this a bug, am I hitting some limitation of json_v2 or am I doing something wrong? On the Playground this looks like it should work.

Best, Sebbi

srebhan commented 2 years ago

You can solve this with the xpath parser:

[[inputs.file]]
    files = ["./test2.json"]
    data_format = "xpath_json"

    [[inputs.file.xpath]]
      metric_selection = "//datapoints/*"
      metric_name = "'gridradar'"
      timestamp = "descendant::*[2]"
      timestamp_format = "2006-01-02 15:04:05"
      [inputs.file.xpath.tags]
        target = "//target"
      [inputs.file.xpath.fields]
         value = "number(descendant::*[1])"

resulting in

> gridradar,host=Hugin,target=median_frequency value=50.003 1657025756000000000
> gridradar,host=Hugin,target=median_frequency value=50.014 1657025757000000000
> gridradar,host=Hugin,target=median_frequency value=49.998 1657025758000000000
> gridradar,host=Hugin,target=median_frequency value=49.893 1657025759000000000

for your example

sstickel commented 2 years ago

Thanks Sven, that worked perfectly. Didn't know about xpath being able to parse json. Althought my primary problem is solved now, why didn't that work the way I tried it? I had expected to get meaningful output from this path: path = "{datapoints:{f:0.datapoints.#.0,t:0.datapoints.#.1}.@group}"

GJSON Playground output looks good to me. Getting just zero output when running telegraf -test irritates me.

{
  "datapoints": [
    {
      "f": 50.003,
      "t": "2022-07-05 12:55:56"
    },
    {
      "f": 50.014,
      "t": "2022-07-05 12:55:57"
    },
    {
      "f": 49.998,
      "t": "2022-07-05 12:55:58"
    },
    {
      "f": 49.893,
      "t": "2022-07-05 12:55:59"
    }
  ]
}