influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.51k stars 5.55k forks source link

Add option to have canonical field-names #12352

Closed protonmarco closed 1 year ago

protonmarco commented 1 year ago

Use Case

The problem: I'm using [[inputs.gnmi]] to get telemetry from switches using Openconfig paths, and I noticed that if I query a path like this:

[[inputs.gnmi.subscription]]
    name = "test"
    origin = "openconfig-interfaces"
    path = "/interfaces/interface/state/counters"
    subscription_mode = "sample"
    sample_interval = "10s"

I get the specific metric with field-name test_in_octets; while if I make the subscription to a parent path (here and here you can have a view of the model structure)

  [[inputs.gnmi.subscription]]
    name = "test"
    origin = "openconfig-interfaces"
    path = "/interfaces"
    subscription_mode = "sample"
    sample_interval = "10s"

i get the same metric (+ others that I don't need for this example) with field-name test_interface_state_counters_in_octets.

This creates an issue when I want to switch from a wider path (like the latter) to a more specific one (like the former), because it results in change of the field-name, while the metric remains the same, and every query needs to be rewritten because of it (or a processor needs to be introduced).

Expected behavior

 A -> B -> C -> in_octets
        -> K -> in_octets
        -> X -> in_octets

if I subscribe to A I would like to get B_C_in_octets, B_K_in_octets, B_X_in_octets and if I subscribe to B I would like to still get B_C_in_octets, B_K_in_octets, B_X_in_octets.

Actual behavior

 A -> B -> C -> in_octets
        -> K -> in_octets
        -> X -> in_octets

if I subscribe to A I get B_C_in_octets, B_K_in_octets, B_X_in_octets and if I subscribe to B I get C_in_octets, K_in_octets, X_in_octets and so on.

Additional info

No response

powersj commented 1 year ago

if I subscribe to A I would like to get B_C_in_octets, B_K_in_octets, B_X_in_octets and if I subscribe to B I would like to still get B_C_in_octets, B_K_in_octets, B_X_in_octets.

The current behavior seems consistent, in that you omit the the subscribed node and get all the nodes below. Your proposed behavior of subscribing to B suddenly chooses to add node you subscribe to only because it is not the root? How does this work when you include another level (i.e. A -> B -> C -> D -> in_octets)?

This creates an issue when I want to switch from a wider path

If you change your path you should expect a change

protonmarco commented 1 year ago

Your proposed behavior of subscribing to B suddenly chooses to add node you subscribe to only because it is not the root? How does this work when you include another level (i.e. A -> B -> C -> D -> in_octets)?

Correct, in case of A -> B -> C -> D -> in_octets you would get always B_C_D_in_octets whether you subscribe to A, B, C or D.

If you change your path you should expect a change

I understand, but the expected change (in my use case) has the goal to receive less data, so use less resources, without having to rewrite all the queries.

srebhan commented 1 year ago

@protonmarco please test the binary in PR #13326 once CI successfully finished the tests. Let me know if this is what you had in mind!

protonmarco commented 1 year ago

@srebhan I tested it and I wasn't able to see any change in behavior with canonical_field_names set on true or false. With the following configuration:

[[inputs.gnmi.subscription]]
  name = "test"
  origin = "openconfig-interfaces"
  path = "/interfaces/interface/state/counters"
  subscription_mode = "sample"
  sample_interval = "10s"

I always got test_in_octets as a resulting field and not test_interface_state_counters_in_octets

Let me know if you need other info

srebhan commented 1 year ago

@protonmarco please test the binary in #13332 once CI finished the tests successfully and let me know if that fixes the issue!

protonmarco commented 1 year ago

Now i can see it working with canonical_field_names=true but if I use the following:

  [[inputs.gnmi.subscription]]
    name = "test"
    origin = "openconfig-interfaces"
    path = "/interfaces/interface/state/counters"
    subscription_mode = "sample"
    sample_interval = "10s"

I get test__interfaces_interface_state_counters_in_octets and, correct me if I'm wrong, that test__ prefix shouldn't be there. If I leave name = "" I get counters__interfaces_interface_state_counters_in_octets which doesn't look ok either.

srebhan commented 1 year ago

Can you show the influx-line-protocol equivalent? You can use

[[outputs.file]]
  files = ["stdout"]

to get it. I'm asking because it looks like the metric name is mangled in which also might be due to the output....

protonmarco commented 1 year ago

this is the output for that interface:

counters,name=et-0/0/0,source=mysource in_pkts=41209470i,in_octets=2952763029i,in_unicast_pkts=40734678i,in_multicast_pkts=469284i,in_broadcast_pkts=5508i,in_pause_pkts=0i,out_pkts=26403631i,out_octets=3641290883i,out_unicast_pkts=25929026i,out_multicast_pkts=469267i,out_broadcast_pkts=5338i,out_pause_pkts=0i,in_errors=0i,in_fcs_errors=0i,in_discards=0i,in_unknown_proto_pkts=0i,carrier_transitions=11i,out_errors=11i,out_discards=0i,out_unknown_proto_pkts=0i 1685028122615000000
srebhan commented 1 year ago

But then it's not canonical paths!?!?

protonmarco commented 1 year ago

What do you mean? I'm sorry but I have no knowledge on how that metric should look in order to have a canonical path in prometheus output or how the prometheus metric is built given that record

srebhan commented 1 year ago

It seems like you produced the above line with canonical_field_names=false. Anyhow, I think I found the issue: The PR did not strip the leading slash / and thus the prometheus output likely replaced it with an underscore. Can you please try the newly built binary in #13332 again!?

srebhan commented 1 year ago

@protonmarco were you able to test the fix?

protonmarco commented 1 year ago

You got it right, when i created a new .conf to use your output I left commented the option, my bad. I was able to test the new fix only now, with the following config:

  [[inputs.gnmi.subscription]]
    name = "test"
    origin = "openconfig-interfaces"
    path = "/interfaces/interface/state/counters"
    subscription_mode = "sample"
    sample_interval = "10s"

I got this Influx-line output:

test,name=et-0/0/1,source=mysource /interfaces/interface/state/counters/in_pkts=42166229i,/interfaces/interface/state/counters/in_octets=3234884400i,/interfaces/interface/state/counters/in_unicast_pkts=41688146i,/interfaces/interface/state/counters/in_multicast_pkts=472653i,/interfaces/interface/state/counters/in_broadcast_pkts=5430i,/interfaces/interface/state/counters/out_pkts=25402221i,/interfaces/interface/state/counters/out_octets=1910341691i,/interfaces/interface/state/counters/out_unicast_pkts=24924037i,/interfaces/interface/state/counters/out_multicast_pkts=472628i,/interfaces/interface/state/counters/out_broadcast_pkts=5556i,/interfaces/interface/state/counters/carrier_transitions=21i,/interfaces/interface/state/counters/out_errors=21i 1685129346276000000

and this prometheus output:

test__interfaces_interface_state_counters_in_octets

still with the name prefix. If I set name = "" I get the prom metric with the counters__ prefix like before as well.

srebhan commented 1 year ago

@protonmarco are you really sure you have this binary: https://github.com/influxdata/telegraf/pull/13332#issuecomment-1563114542? Telegraf's version git hash should be something like 1c895a21...

protonmarco commented 1 year ago

affirmative, I just checked and tested it again.

srebhan commented 1 year ago

Hmmm let me see if I can find out why the field-names are still prefixed with an /...

srebhan commented 1 year ago

:face_palm: I got every single branch wrong. :-) Another attempt, please test the binary in the PR once CI finished the build...

protonmarco commented 1 year ago

I gave it a try and something is wrong as gnmi outputs are displayed nowhere in stdout or /metric endpoint.

srebhan commented 1 year ago

Can you please record a tcpdump for a message that is not displayed so I can make up a test-case?

srebhan commented 1 year ago

Can you please test again and show me the log?

protonmarco commented 1 year ago

After spending some time trying to decrypt the pcap, I noticed checking the logs there was a leftover configuration on the switch side that prevented the connection to establish, my fault not checking for it :facepalm: Now with the last 2 versions (If I understood correctly the last one just adds logging improvements) and with name="" I get:

counters,name=et-0/0/0,source=myhost interfaces/interface/state/counters/in_pkts=43376985i,interfaces/interface/state/counters/in_octets=3107799885i,interfaces/interface/state/counters/in_unicast_pkts=42880570i,interfaces/interface/state/counters/in_multicast_pkts=490647i,interfaces/interface/state/counters/in_broadcast_pkts=5768i,interfaces/interface/state/counters/out_pkts=27828468i,interfaces/interface/state/counters/out_octets=3816480377i,interfaces/interface/state/counters/out_unicast_pkts=27332229i,interfaces/interface/state/counters/out_multicast_pkts=490621i,interfaces/interface/state/counters/out_broadcast_pkts=5618i,interfaces/interface/state/counters/carrier_transitions=11i,interfaces/interface/state/counters/out_errors=11i 1685621078856000000

and counters_interfaces_interface_state_counters_in_octets Basically we have an _ less but "counters_" is still there. I just did a comparison with

  [[inputs.gnmi.subscription]]
    name = ""
    origin = "openconfig-interfaces"
    path = "/interfaces"
    subscription_mode = "sample"
    sample_interval = "10s"

and the result is interfaces_interfaces_interface_state_counters_in_octets, what do you think on cutting out the first word_ on both? as from what I see it's useless in both cases

srebhan commented 1 year ago

As you can see in the influx output

counters,name=et-0/0/0,source=myhost interfaces/interface/state/counters/in_pkts=43376985i,interfaces/interface/state/counters/in_octets=3107799885i,interfaces/interface/state/counters/in_unicast_pkts=42880570i,interfaces/interface/state/counters/in_multicast_pkts=490647i,interfaces/interface/state/counters/in_broadcast_pkts=5768i,interfaces/interface/state/counters/out_pkts=27828468i,interfaces/interface/state/counters/out_octets=3816480377i,interfaces/interface/state/counters/out_unicast_pkts=27332229i,interfaces/interface/state/counters/out_multicast_pkts=490621i,interfaces/interface/state/counters/out_broadcast_pkts=5618i,interfaces/interface/state/counters/carrier_transitions=11i,interfaces/interface/state/counters/out_errors=11i 1685621078856000000

you got counters as the metric name and the rest interfaces/interface/state/counters/in_octets as field name. So this is correct from the input side. Your output (is it outputs.prometheus_client?) then replaces the slashes in the field name and prefixes the field name with the metric name. If you want this changed, it is probably something we want to change in the output... Please open a feature-request for this!

Other than that I think we are good to merge the PR after removing the debug stuff...