vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.61k stars 1.55k forks source link

Add trailing newline when sending to `aws_kinesis_firehose` #8487

Open andryr-dbx opened 3 years ago

andryr-dbx commented 3 years ago

Current Vector Version

vector 0.15.0 (x86_64-apple-darwin 994d812 2021-07-16)

Use-cases

When sending to a Kinesis Firehose sink, the output lands in S3 as one single line even if the input is newline delimited. For example, the file being read is JSON, newline delimited, one event per line:

{ "foo1": "bar1",  "foo2": "bar2"}
{ "foo1": "bar1",  "foo2": "bar2"}
{ "foo1": "bar1",  "foo2": "bar2"}

But when they get to S3 they look like this: { "foo1": "bar1", "foo2": "bar2"}{ "foo1": "bar1", "foo2": "bar2"}{ "foo1": "bar1", "foo2": "bar2"}

Some systems require one JSON event per line.

Attempted Solutions

I have tried using a transform to "cast" the input as JSON and it's parsed correctly, however the output is still one single line. The console sink respects the newline characters though and formats the output correctly

Proposal

Allow the option to insert newline characters in between events at the sink level.

References

jszwedko commented 3 years ago

Thanks @andryr-dbx !

We chatted a bit about this in discord. I believe what we'll want to do here is optionally allow for events sent to Kinesis to have a trailing \n.

jszwedko commented 3 years ago

Could be relevant to the codec work too. cc/ @pablosichert

andryr-dbx commented 3 years ago

re:codec work, does that involve making ndjson as an option? that might be the answer to this. Didn't that use to be an option?

jszwedko commented 3 years ago

Yeah, I think ndjson could fit here. I think there is some nuance in this case because each event is sent as a separate "record" to Kinesis in the HTTP request rather than it being a set of events with a delimiter, but ndjson should also always result with the final (in this case only) serialized event ending with a trailing newline.

timmahj commented 2 years ago

Thanks to @jszwedko for helping with a workaround.

To get on JSON event per line you need to setup a final transform to encode JSON and add a trailing '\n'

[transforms.messageNewLine]
inputs = [ "all", "things", "going", "to", "firehose" ]
type = "remap"
source = """
.message = encode_json(.) + "\n"
"""

In the firehose sink, set encoding to text

[sinks.my_sink_id]
type = "aws_kinesis_firehose"
inputs = [ "messageNewLine" ]
encoding.codec = "text"