vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
16.97k stars 1.46k forks source link

Update Syslog source to accept non UTF-8 encoding in syslog message #20462

Open Neko-Follower opened 1 month ago

Neko-Follower commented 1 month ago

A note for the community

Problem

Vector drops logs when encounter a syslog message with non UTF-8 characters. Can you add an option to replace non utf-8 characters with U+FFFD or allow passing non-UTF8 text as-is like a Promtail do.

Configuration

No response

Version

0.37.1-distroless-static

Debug Output

No response

Example Data

2024-05-08T07:35:43.209847Z DEBUG source{component_kind="source" component_id=rsyslog component_type=syslog}:connection{peer_addr=172.22.0.4:44600}: vector::sources::util::net::tcp: Accepted a new connection. peer_addr=172.22.0.4:44600

2024-05-08T07:35:44.293974Z ERROR source{component_kind="source" component_id=rsyslog component_type=syslog}:connection{peer_addr=172.22.0.4:44594}: vector::internal_events::codecs: Failed framing bytes. error=Unable to decode input as UTF8 error_code="decoder_frame" error_type="parser_failed" stage="processing" internal_log_rate_limit=true

2024-05-08T07:35:44.294029Z ERROR source{component_kind="source" component_id=rsyslog component_type=syslog}:connection{peer_addr=172.22.0.4:44594}: vector::internal_events::codecs: Internal log [Failed framing bytes.] is being suppressed to avoid flooding.

Additional Context

No response

References

No response

jszwedko commented 1 month ago

Agreed, this could be modeled like the existing decoding.codec.json.lossy option which replaces invalid UTF-8 characters.

jszwedko commented 1 month ago

We'd be happy to see a PR for this if someone is motivated! It should be a relatively straightforward change.

osas1111 commented 1 week ago

I have the same problem with Fluent source