jcustenborder / kafka-connect-splunk

Kafka Connect connector for receiving data and writing data to Splunk.
Apache License 2.0
25 stars 10 forks source link

Event Field Deserialization in Sink Connector #12

Open lilgreenwein opened 7 years ago

lilgreenwein commented 7 years ago

I'm trying to understand how the sink connector maps the deserialized data from Kafka to what it ends up posting to the HEC endpoint.

My data in kafka is "pre-baked" in to an HEC friendly JSON schema. An example of a message as it is in Kafka (from a console consumer):

{ "event": "Test message", "time": "1500928077", "host": "myhost.company.com"}

(I'm using org.apache.kafka.connect.json.JsonConverter as the key/value converter, and not using schema server)

Somewhere in the kafka-connect-splunk pipeline (I'm guessing EventIterator or EventConverter) messages are mapped in to a new object. I enabled trace logging in log4j and was able to determine that the example message from above actually posted to the HEC endpoint as:

{"host":"myhost.company.com","event":{"event":"Test message"}}

The host field does what I would like - it retains the value from the original JSON object. The event is what I'm trying to figure out. I would like to know how to replace that event object that connect-splunk creates with the event object from the original JSON structure. In this case the value for event would be a string, not a JSON object, e.g.:

{"host":"myhost.company.com","event":"Test message"}

The issue it causes is in how the message is indexed and displayed in the destination Splunk instance. As it is now, the above example message is indexed as a JSON object like so:

a

Whereas if event is just a string, it gets indexed as so:

b

Is this a bug or a feature request? Or am I just approaching this from the wrong direction?