Open andryr-dbx opened 3 years ago
Thanks @andryr-dbx !
We chatted a bit about this in discord. I believe what we'll want to do here is optionally allow for events sent to Kinesis to have a trailing \n
.
Could be relevant to the codec work too. cc/ @pablosichert
re:codec work, does that involve making ndjson
as an option? that might be the answer to this. Didn't that use to be an option?
Yeah, I think ndjson
could fit here. I think there is some nuance in this case because each event is sent as a separate "record" to Kinesis in the HTTP request rather than it being a set of events with a delimiter, but ndjson
should also always result with the final (in this case only) serialized event ending with a trailing newline.
Thanks to @jszwedko for helping with a workaround.
To get on JSON event per line you need to setup a final transform to encode JSON and add a trailing '\n'
[transforms.messageNewLine]
inputs = [ "all", "things", "going", "to", "firehose" ]
type = "remap"
source = """
.message = encode_json(.) + "\n"
"""
In the firehose sink, set encoding to text
[sinks.my_sink_id]
type = "aws_kinesis_firehose"
inputs = [ "messageNewLine" ]
encoding.codec = "text"
Current Vector Version
Use-cases
When sending to a Kinesis Firehose sink, the output lands in S3 as one single line even if the input is newline delimited. For example, the file being read is JSON, newline delimited, one event per line:
But when they get to S3 they look like this:
{ "foo1": "bar1", "foo2": "bar2"}{ "foo1": "bar1", "foo2": "bar2"}{ "foo1": "bar1", "foo2": "bar2"}
Some systems require one JSON event per line.
Attempted Solutions
I have tried using a transform to "cast" the input as JSON and it's parsed correctly, however the output is still one single line. The console sink respects the newline characters though and formats the output correctly
Proposal
Allow the option to insert newline characters in between events at the sink level.
References