vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.45k stars 1.52k forks source link

Final newline character missing when using AWS S3 sink #21086

Open miquelruiz opened 4 weeks ago

miquelruiz commented 4 weeks ago

A note for the community

Problem

When using the AWS S3 sink to upload local logs, the last line of each uploaded object is missing a newline character at the end. This is not the case when using the File sink.

It seems to be caused by the S3 sink calling encoder.serialize on the last event of the batch (https://github.com/vectordotdev/vector/blob/master/src/sinks/util/encoding.rs#L51) while the file sink calls encoder.encode on all events (https://github.com/vectordotdev/vector/blob/master/src/sinks/file/mod.rs#L427).

I would expect these to behave in the same way using the provided config.

Configuration

[sources.messages]
type = "file"
include = [
  "/var/log/messages",
]

[sinks.s3]
inputs = ["messages"]
type = "aws_s3"
bucket = "<redacted>"
region = "us-east-1"
key_prefix = "<redacted>"
healthcheck.enabled = false
filename_append_uuid = false
filename_extension = ""
compression = "none"
encoding.codec = "text"
auth.access_key_id = "<redacted>"
auth.secret_access_key = "<redacted>"
batch.timeout_secs = 60

[sinks.file]
type = "file"
inputs = [ "messages" ]
path = "/tmp/messages-%Y%m%d-%H%M%S.log"
encoding.codec = "text"

Version

vector 0.40.0 (aarch64-unknown-linux-gnu 1167aa9 2024-07-29 15:08:44.028365803)

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

jszwedko commented 3 weeks ago

Good spot @miquelruiz

I think https://github.com/vectordotdev/vector/blob/f7e4470f6e1a415c8b93fbc290ad9c6d05d77f14/src/sinks/util/encoding.rs#L48-L59 is attempting to handle proper batch encoding when the batch is being encoded as a JSON array. If encode is used, a , is appended to each element as it is encoded, but when encoding as a JSON array the trailing , isn't wanted which is why serialize is used instead.

I think a possible fix for this would be to update:

https://github.com/vectordotdev/vector/blob/f7e4470f6e1a415c8b93fbc290ad9c6d05d77f14/src/codecs/encoding/encoder.rs#L96-L104

To have it add a \n if the framer is Framer::NewlineDelimited.

jszwedko commented 3 weeks ago

Opened a PR: https://github.com/vectordotdev/vector/pull/21097