influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.67k stars 5.59k forks source link

Splunk serializer data_format=splunkmetric does not work in 1.23+ #12010

Closed pgeler closed 2 years ago

pgeler commented 2 years ago

Relevant telegraf.conf

[[outputs.http]]
   url = "${SPLUNK_HEC_STATSD_URL}"
   timeout = "5s"
   data_format = "splunkmetric"
   splunkmetric_hec_routing = true
   splunkmetric_multimetric = true
...

Logs from Telegraf

Additional info

System info

Telegraf 1.23

Docker

No response

Steps to reproduce

  1. splunk cloud + mstats index
  2. send data through HTTP hec forwarder
  3. failed ...

Expected behavior

just work

Actual behavior

does not work

Additional info

Our current version was 1.20, after an attempt to upgrade to 1.24, Splunk mstats stopped working, and index does not have any recent data. The research found that a new "event" field is added and causing the problem. Also found two controversial PR's #8039 and #8761, and closed issue #11500 without an explanation of how to move forward with the most recent versions. Looks like this functionality needs to have a flag to enable/disable the event field. @powersj thoughts?

powersj commented 2 years ago

closed issue https://github.com/influxdata/telegraf/issues/11500 without an explanation of how to move forward with the most recent versions.

This was closed because the user never responded and confirmed the root cause. It sounds like you have done some additional research that helps confirm the issue.

@lneva-fastly, @fastly-ffej - it seems there are users who are pushing to splunk, but having issues with the re-added 'event' field. Is there something that would be different between users' deployments that would omit or not require this field (e.g. different endpoint URL? different version)?

powersj commented 2 years ago

@pgeler what product and version of splunk are you using?

While we wait for a response above, I put up a draft PR in https://github.com/influxdata/telegraf/pull/12024 that adds a new splunkmetric_event_tag option. Can you confirm using that PR without that option works for you?

pgeler commented 2 years ago

@powersj, it's Splunk cloud, I can verify the exact version of Splunk forwarders(don't this it's a forwarded issue), but my hypothesis after reading #8039 is that it's something dependent on the token setup, at the time we were configuring this we created a separate token in SplunkCloud for mstats index and that how it was working, versus something which is setup as a universal token and has the ability to send data to different indexes. None of the configurations were involving forwarders, and those are the same forwarders we are using for other index types.

powersj commented 2 years ago

@pgeler thanks, were you able to try out the artifacts in #12024 to ensure it works as expected now?

pgeler commented 2 years ago

@pgeler thanks, were you able to try out the artifacts in #12024 to ensure it works as expected now?

could not make amd64.deb package to work, IMHO you are missing config/config.go changes 🤔

powersj commented 2 years ago

could not make amd64.deb package to work, IMHO you are missing config/config.go changes thinking

I am, but given it defaults to false, that concerns me that you still weren't able to send metrics. Can you share an error message?

edit: I've pushed an update with the config changes as well

pgeler commented 2 years ago

it was just complaining about an unknown configuration option... but yes it works otherwise

powersj commented 2 years ago

@lneva-fastly, @fastly-ffej,

Can I get one of you to verify that the option in #12024, when enabled works for you? It does seem that splunk has different behaviors depending on what product you are on.

pgeler commented 2 years ago

just curious about what is next, from here? we are looking for other optimizations that happened in parallel after 1.23, so no pressure but ;)

powersj commented 2 years ago

@pgeler sorry for the delay. After going through the history I have flipped the and renamed the option. It will now need to be set to omit the event tag. Can I get you to try the artifacts one more time and ensure to set the option to true?

Thanks

pgeler commented 2 years ago

@powersj looks good to me