splunk / kafka-connect-splunk

Kafka connector for Splunk
Apache License 2.0
94 stars 102 forks source link

Specify alternative record for _time #271

Open fwijnholds opened 3 years ago

fwijnholds commented 3 years ago

Situation: Data is being received on the HEC from Kafka-Connect using the Splunk plugin. Multiple data sources are being sent in one stream of data.

Issue When using the ‘Event’ endpoint, the timestamp in the metadata added by Kafka-Connect is given precedence on the timestamp extraction from the event. This timestamp reflects the moment when either kafka received the event, not when the event was generated. In the event of an issue on the log-source which introduces delay, the timestamp in Splunk will be incorrect leading to correlation issues.

When using the ‘Raw’ endpoint this issue does not pop-up however this situation is unable to handle the amount of events we are receiving.

Temporary fix: To fix this I’ve resorted to using a ‘ingest_eval’ for the sourcetype with an elaborate case() statement to attempt to find all the possible timestamps using strptime and substr, but when timestamps conflict in this logic the events are dropped.

Proposed fix: Introduce an option on either the HEC or the kafka_connect plugin to choose if the metadata timestamp is leading or is to be ignored.

I had hoped the “splunk.hec.use.record.timestamp” would allow this to happen, but sadly it does nothing to fix this.

ilyaresh commented 3 years ago

Same situation as described by @fwijnholds Would like, given an event payload below, to be able to assign the value of data.timestamp to _time

{
  "some-org-metadata": {
    "correlationId": "1621449582",
    "bu-name": "unit1"
  },
  "data": {
    "timestamp": "2021-05-18T21:06:23,192+10:00",
    "event_severity": "Critical",
    "event_title": "Something Failed",
    "event_description": "A bit more details about the failure"
  }
}

Might also need to have option for correct time parsing format