vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
18.14k stars 1.6k forks source link

splunk_hec source does not read metadata from query parameters when receiving raw data #17236

Open MadsRC opened 1 year ago

MadsRC commented 1 year ago

A note for the community

Problem

When sending data to Splunk via the HTTP Event Collector (HEC), one can provide certain metadata, in addition to the event payload. The metadata usually associated with HEC traffic is index, sourcetype, source and channel.

When sending to the /services/collector/event or /services/collector endpoint, all this metadata, except for channel is provided in JSON payload of the HTTP body. An example would be:

{
"index": "someIndex",
"source": "a source",
"sourcetype": "data",
"message": "this is the event data"
}

When sending such an event, the fields will be available in the event. Fields such as source will have splunk_ prepended (this is undocumented in the Splunk HEC source documentation).

For example, when sending the data to Vector's splunk HEC input like this:

curl -v "https://hec.url/services/collector" -H "Authorization: Splunk REDACTED" -d '{"event": "Hello, world!", "source": "manual"}'

the resulting event will look like this:

{"host":"185.229.155.97","message":"Hello, world!","source_type":"splunk_hec","splunk_source":"manual","timestamp":"2023-04-27T11:28:30.347270043Z"}

When sending data to the /services/collector/raw endpoint of the Splunk HEC source, the HTTP body is used as a raw event. If one wishes to provide metadata, this has to be done via query parameters. Unfortunately, only the channel query parameter is extracted.

When sending an event to the Splunk HEC source like this:

curl -v "https://hec.url/services/collector/raw?channel=00872DC6-AC83-4EDE-8AFE-8413C3825C4C&source=manual&sourcetype=random" -H "Authorization: Splunk REDACTED" -d 'Hello, raw world!'

the following event is generated:

{"host":"185.229.155.97","message":"Hello, raw world!","source_type":"splunk_hec","splunk_channel":"00872DC6-AC83-4EDE-8AFE-8413C3825C4C","timestamp":"2023-04-27T11:34:38.715063479Z"}

Notice the lack of splunk_source and splunk_sourcetype fields.

Expected behaviour would be:

  1. Prepending splunk_ to the metadata fields should be documented
  2. When sending data to the /services/collector/raw endpoint, index, source and sourcetype should be extracted from query parameters

Configuration

No response

Version

0.27.1-debian

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

bruceg commented 1 year ago

I can confirm that Splunk HEC is documented to take in those parameters from the query parameters (examples 3 and 4) and that our source only looks at the channel parameter.

I am a little confused what you are saying about the channel= prefix, though. In the example shown above, the query parameter includes an extraneous channel= (note: …/raw?channel=channel=00872DC6-AC83-4EDE-8AFE-8413C3825C4C&…) but the linked examples don't show this. Is that a typo?

MadsRC commented 1 year ago

I can confirm that Splunk HEC is documented to take in those parameters from the query parameters (examples 3 and 4) and that our source only looks at the channel parameter.

I am a little confused what you are saying about the channel= prefix, though. In the example shown above, the query parameter includes an extraneous channel= (note: …/raw?channel=channel=00872DC6-AC83-4EDE-8AFE-8413C3825C4C&…) but the linked examples don't show this. Is that a typo?

You are absolutely correct, that was a typo - I've been awake for too long...

I've fixed the description to not include the channel part.