vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.48k stars 1.53k forks source link

RFC: Handling of (non-UTF-8) byte payloads in Vector and VRL #11577

Open pablosichert opened 2 years ago

pablosichert commented 2 years ago

When ingesting arbitrary bytes, components within the Vector topology currently may handle the payload in any of these ways:

Meaning, some combination of sources, transforms, sinks and their decoding/encoding settings may be able to handle non-UTF-8 data, others may not. However, we are not explicit to which level we support this.

Another argument in this discussion is log processing on Windows where UTF-16 encoding is often used.

jszwedko commented 2 years ago

Related: https://github.com/vectordotdev/vector/issues/10571

fpytloun commented 2 years ago

Might be related to this as well: https://github.com/vectordotdev/vector/discussions/12131

Causing this error when decoding JSON with some unsupported characters:

function call error for \"parse_json\" at (20:49): unable to parse json: invalid unicode code point at line 1 column 8587