influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.62k stars 5.58k forks source link

Descendants of the nodes not captured in same format #13655

Closed Lavanya2102 closed 1 year ago

Lavanya2102 commented 1 year ago

Relevant telegraf.conf


 [[inputs.exec.xpath]]
                metric_name = "'cpu_partitions'"
                metric_selection = "/cpu_partitions/*"
                field_selection = "descendant::*[not(*)]"
   [[outputs.kafka]]
        brokers = ["admin:9092"]
        topic = "cpu_partitions"
        data_format = "json"
        namepass = ["cpu_partitions"]

Logs from Telegraf

telegraf --config /etc/telegraf/telegraf.conf
2023-07-23T15:00:08Z I! Loading config: /etc/telegraf/telegraf.conf
2023-07-23T15:00:08Z I! Starting Telegraf 1.28.0-d3c45417
2023-07-23T15:00:08Z I! Available plugins: 237 inputs, 9 aggregators, 28 processors, 24 parsers, 59 outputs, 4 secret-stores
2023-07-23T15:00:08Z I! Loaded inputs: exec
2023-07-23T15:00:08Z I! Loaded aggregators:
2023-07-23T15:00:08Z I! Loaded processors:
2023-07-23T15:00:08Z I! Loaded secretstores:
2023-07-23T15:00:08Z I! Loaded outputs: kafka (6x)
2023-07-23T15:00:08Z I! Tags enabled:
2023-07-23T15:00:08Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"", Flush Interval:10s

System info

Telegraf 1.28

Docker

No response

Steps to reproduce

  1. From the above conf file I am extracting cpu_partitions.
  2. This is the cpu_partitions data:
    {
    "cpu_partitions": [
        {
            "partition": "workq",
            "nnodes": 1,
            "ncpus": 1,
            "total_jobs": 5,
            "host": "gpu1",
            "timestamp": 1689750481722,
            "nodelist": [
                "gpu2",
                "gpu3"
            ],
            "boot_fail_jobs": 0,
            "node_fail_jobs": 0,
            "out_of_memory_jobs": 0,
            "cancelled_jobs": 0,
            "pending_jobs": 5,
            "preempted_jobs": 0,
            "completed_jobs": 0,
            "running_jobs": 0,
            "resv_del_hold_jobs": 0,
            "configuring_jobs": 0,
            "requeue_fed_jobs": 0,
            "requeue_hold_jobs": 0,
            "completing_jobs": 0,
            "requeued_jobs": 0,
            "resizing_jobs": 0,
            "deadline_jobs": 0,
            "revoked_jobs": 0,
            "signaling_jobs": 0,
            "failed_jobs": 0,
            "special_exit_jobs": 0,
            "stage_out_jobs": 0,
            "stopped_jobs": 0,
            "suspended_jobs": 0,
            "timeout_jobs": 0
        }
    ]
    }
  3. Once the telegraf collects the data, nodelist needs to have data in the same format i.e nodelist = ["gpu2", "gpu3"]
  4. But this is what I am getting: "nodelist_": "gpu1", "nodelist__2": "gpu3"
  5. The expansion should not be there, if not specified to true

Expected behavior

    {
        "partition": "workq",
        "nnodes": 1,
        "ncpus": 1,
        "total_jobs": 5,
        "host": "gpu1",
        "timestamp": 1689750481722,
        "nodelist": [
            "gpu2", "gpu3"
        ],
        "boot_fail_jobs": 0,
        "node_fail_jobs": 0,
        "out_of_memory_jobs": 0,
        "cancelled_jobs": 0,
        "pending_jobs": 5,
        "preempted_jobs": 0,
        "completed_jobs": 0,
        "running_jobs": 0,
        "resv_del_hold_jobs": 0,
        "configuring_jobs": 0,
        "requeue_fed_jobs": 0,
        "requeue_hold_jobs": 0,
        "completing_jobs": 0,
        "requeued_jobs": 0,
        "resizing_jobs": 0,
        "deadline_jobs": 0,
        "revoked_jobs": 0,
        "signaling_jobs": 0,
        "failed_jobs": 0,
        "special_exit_jobs": 0,
        "stage_out_jobs": 0,
        "stopped_jobs": 0,
        "suspended_jobs": 0,
        "timeout_jobs": 0
    }

Actual behavior

    {
        "partition": "workq",
        "nnodes": 1,
        "ncpus": 1,
        "total_jobs": 5,
        "host": "gpu1",
        "timestamp": 1689750481722,
        "nodelist_":"gpu2",
        "nodelist_2":"gpu3"
        "boot_fail_jobs": 0,
        "node_fail_jobs": 0,
        "out_of_memory_jobs": 0,
        "cancelled_jobs": 0,
        "pending_jobs": 5,
        "preempted_jobs": 0,
        "completed_jobs": 0,
        "running_jobs": 0,
        "resv_del_hold_jobs": 0,
        "configuring_jobs": 0,
        "requeue_fed_jobs": 0,
        "requeue_hold_jobs": 0,
        "completing_jobs": 0,
        "requeued_jobs": 0,
        "resizing_jobs": 0,
        "deadline_jobs": 0,
        "revoked_jobs": 0,
        "signaling_jobs": 0,
        "failed_jobs": 0,
        "special_exit_jobs": 0,
        "stage_out_jobs": 0,
        "stopped_jobs": 0,
        "suspended_jobs": 0,
        "timeout_jobs": 0
    }

Additional info

No response

srebhan commented 1 year ago

@Lavanya2102 please test the binary in PR #13660 and let me know if this fixes your problem with the following config

    metric_name = "'cpu_partitions'"
    metric_selection = "/cpu_partitions/*"
    field_selection = "*"
srebhan commented 1 year ago

PR #13660 allows to fill fields with complex type with their native JSON representation. However, this field is still a string as Telegraf does not have structured field types. If you need to reconstruct the arrays on the output as JSON you should use field_name_expansion = true and resemble the arrays by a JSONata transformation...

  json_transformation = '''
  {
    "tags": tags,
    "name": name,
    "timestamp": timestamp,
    "fields": $merge(
        [
            $sift(fields, function($v, $k) {
                $not($match($k, /nodelist_/))
            }),
            {
                "nodelist": $each(
                    $sift(fields, function($v, $k) {$match($k, /nodelist_/)}),
                    function($v, $k) {
                        $v
                    }
                )
            }
        ]
     )
  }