influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.53k stars 5.56k forks source link

[parsers/xml] Add ability to ignore an empty selection node #9008

Closed M0rdecay closed 2 years ago

M0rdecay commented 3 years ago

Proposal:

Sometimes it is possible that a selector may not return a node. For example, the original XML looks like this:

<?xml version='1.0' encoding='utf-8'?>
<SELFCARE>
    <POOL_GET_INFO>
        <POOL>
            <POOL_ID>60020009</POOL_ID>
            <NICK_NAME>HTTP_ROOT</NICK_NAME>
            <PRIORITY>1</PRIORITY>
            <DEFAULT>1</DEFAULT>
            <WAIT_POOL_TIMEOUT>30</WAIT_POOL_TIMEOUT>
            <GATHER_STATISTIC>1</GATHER_STATISTIC>
            <ONLINE_STATUS>ONLINE</ONLINE_STATUS>
            <POOL_MGMNT_TYPE>1</POOL_MGMNT_TYPE>
            <SEGMENT>1</SEGMENT>
            <ONLINE>1</ONLINE>
            <OFFLINE_MODE>1</OFFLINE_MODE>
            <N_READY_POOLS>8</N_READY_POOLS>
            <CHILDS>
                <POOL>
                    <POOL_ID>130771004</POOL_ID>
                    <NICK_NAME>SOME_SERVER</NICK_NAME>
                    <PRIORITY>1</PRIORITY>
                    <DEFAULT>0</DEFAULT>
                    <WAIT_POOL_TIMEOUT>30</WAIT_POOL_TIMEOUT>
                    <GATHER_STATISTIC>0</GATHER_STATISTIC>
                    <ONLINE_STATUS>ONLINE</ONLINE_STATUS>
                    <POOL_MGMNT_TYPE>1</POOL_MGMNT_TYPE>
                    <SEGMENT>1</SEGMENT>
                    <ONLINE>1</ONLINE>
                    <OFFLINE_MODE>1</OFFLINE_MODE>
                    <N_READY_POOLS>1</N_READY_POOLS>
                    <CHILDS>
                        <POOL>
                            <POOL_ID>130772004</POOL_ID>
                            <NICK_NAME>SOME_SERVER_1</NICK_NAME>
                            <PRIORITY>1</PRIORITY>
                            <DEFAULT>0</DEFAULT>
                            <WAIT_POOL_TIMEOUT>30</WAIT_POOL_TIMEOUT>
                            <GATHER_STATISTIC>0</GATHER_STATISTIC>
                            <ONLINE_STATUS>ONLINE</ONLINE_STATUS>
                            <POOL_MGMNT_TYPE>1</POOL_MGMNT_TYPE>
                            <ORIGINAL_N_MIN></ORIGINAL_N_MIN>
                            <ORIGINAL_N_MAX></ORIGINAL_N_MAX>
                            <N_MIN>1</N_MIN>
                            <N_MAX>5</N_MAX>
                            <DESTROY_INACTIVE_TIME>60</DESTROY_INACTIVE_TIME>
.........
                        </POOL>
..............
                    </CHILDS>
                </POOL>
..............
            </CHILDS>
        </POOL>
    </POOL_GET_INFO>
</SELFCARE>

And sometimes like this:

<?xml version='1.0' encoding='utf-8'?>
<SELFCARE>
    <POOL_GET_INFO>
        <POOL>
            <POOL_ID>60020009</POOL_ID>
            <NICK_NAME>HTTP_ROOT</NICK_NAME>
            <PRIORITY>1</PRIORITY>
            <DEFAULT>1</DEFAULT>
            <WAIT_POOL_TIMEOUT>30</WAIT_POOL_TIMEOUT>
            <GATHER_STATISTIC>1</GATHER_STATISTIC>
            <ONLINE_STATUS>ONLINE</ONLINE_STATUS>
            <POOL_MGMNT_TYPE>1</POOL_MGMNT_TYPE>
            <SEGMENT>1</SEGMENT>
            <ONLINE>1</ONLINE>
            <OFFLINE_MODE>1</OFFLINE_MODE>
            <N_READY_POOLS>8</N_READY_POOLS>
            <CHILDS>
                <POOL>
                    <POOL_ID>130771004</POOL_ID>
                    <NICK_NAME>SOME_SERVER</NICK_NAME>
                    <PRIORITY>1</PRIORITY>
                    <DEFAULT>0</DEFAULT>
                    <WAIT_POOL_TIMEOUT>30</WAIT_POOL_TIMEOUT>
                    <GATHER_STATISTIC>0</GATHER_STATISTIC>
                    <ONLINE_STATUS>ONLINE</ONLINE_STATUS>
                    <POOL_MGMNT_TYPE>1</POOL_MGMNT_TYPE>
                    <SEGMENT>1</SEGMENT>
                    <ONLINE>1</ONLINE>
                    <OFFLINE_MODE>1</OFFLINE_MODE>
                    <N_READY_POOLS>1</N_READY_POOLS>
                </POOL>
                <POOL>
                    <POOL_ID>130771005</POOL_ID>
                    <NICK_NAME>SOME_HOST</NICK_NAME>
                    <PRIORITY>1</PRIORITY>
                    <DEFAULT>0</DEFAULT>
                    <WAIT_POOL_TIMEOUT>30</WAIT_POOL_TIMEOUT>
                    <GATHER_STATISTIC>0</GATHER_STATISTIC>
                    <ONLINE_STATUS>ONLINE</ONLINE_STATUS>
                    <POOL_MGMNT_TYPE>1</POOL_MGMNT_TYPE>
                    <SEGMENT>1</SEGMENT>
                    <ONLINE>1</ONLINE>
                    <OFFLINE_MODE>1</OFFLINE_MODE>
                    <N_READY_POOLS>1</N_READY_POOLS>
                </POOL>
...................
            </CHILDS>
        </POOL>
    </POOL_GET_INFO>
</SELFCARE>

Current behavior:

In second case, selector /SELFCARE/POOL_GET_INFO/POOL/CHILDS/POOL/CHILDS/POOL returns error:

Error in plugin: cannot parse with empty selection node

with config:

[[inputs.http]]
.....
  data_format = "xml"
  [[inputs.http.xml]]
    metric_selection = "/SELFCARE/POOL_GET_INFO/POOL/CHILDS/POOL/CHILDS/POOL"
.....
  [[inputs.http.xml]]
    metric_selection = "/SELFCARE/POOL_GET_INFO/POOL/CHILDS/POOL"
.....

Desired behavior:

Parser will return a warning, not an error, and continue working

@srebhan

M0rdecay commented 3 years ago

Well, i think, than all we need is remove error here - https://github.com/influxdata/telegraf/blob/master/plugins/parsers/xml/parser.go#L59

srebhan commented 3 years ago

@M0rdecay I agree. However, this also might be an query error... Should we add a flag? Something like accept_empty_selections which is false by default?

phuntik commented 2 years ago

Upvote on this. Using xml input plugin for pacemaker monitoring via crm_mon utility. There is a <failures> section present only if fail happened. If status OK, no failure section will be present. In this case, when using metric selection option for gathering multiple nodes:

[[inputs.exec.xml]]
 metric_selection = "//failures/failure"

telegraf generates error: [inputs.exec] Error in plugin: cannot parse with empty selection node Assuming this xml and overall flow as decent, I suppose there should be option available on plugin side to omit this particular [[inputs.*.xml]] if node is empty.

p.s. if not use metric_selection and set explicit xpath, it will work with no section present in actual xml, by setting metrics to zeros. Lets say, Its is ok. But this way to bypass error leads to lack of flexibility, when multiple nodes gathering needed, which should be kinda fundamental thing in xml and alike formats.

Please, implement. Thanls.

srebhan commented 2 years ago

This was already fixed and merged with #11102.