Since this change we see random parsing/decoding errors on the logstash input w/fluent codec, probably because the events are now not cleanly separated.
If we disable the fluentd keep-alive setting, all logs are processed correctly again, however this is not our favoured solution since it would mean disabling the OpenShift Logging Operator entirely and managing the logging settings manually.
Provide logs (if relevant):
Example error messages:
Fluent parse error, original data now in message field {:error=>#<LogStash::Error: Unknown event type>, :data=>[#<LogStash::Codecs::Fluent::EventTime:0x7258c40c @nsec=783023219, @sec=1629275643>,
[...]
Fluent parse error, original data now in message field {:error=>#<TypeError: can't convert nil into an exact number>
[...]
[ERROR][logstash.inputs.tcp ] <redacted>:37340: closing due: org.logstash.MissingConverterException: Missing Converter handling for full class name=org.jruby.gen.RubyObject3, simple name=RubyObject3
Logstash information:
JVM Bundled JDK
OS version CentOS 8
Description of the problem including expected versus actual behavior:
We are logging from OpenShift (which uses fluentd) to logstash with a pipeline that has a tcp input and a fluent codec:
In the latest update of the OpenShift Logging Operator fluentd now uses TCP keep alive by default. (See https://issues.redhat.com/browse/LOG-1186.)
Since this change we see random parsing/decoding errors on the logstash input w/fluent codec, probably because the events are now not cleanly separated.
If we disable the fluentd keep-alive setting, all logs are processed correctly again, however this is not our favoured solution since it would mean disabling the OpenShift Logging Operator entirely and managing the logging settings manually.
Provide logs (if relevant):
Example error messages: