Random decoding errors on TCP input with fluent codec when client uses keepalive

Logstash information:

Logstash version: 7.14
Logstash installation source: rpm
How is Logstash being run: systemd

JVM Bundled JDK

OS version CentOS 8

Description of the problem including expected versus actual behavior:

We are logging from OpenShift (which uses fluentd) to logstash with a pipeline that has a tcp input and a fluent codec:

input {
    tcp {
        port  => 50522
        codec => fluent {
            nanosecond_precision => true
        }
        id    => "input-openshift"
    }
}

In the latest update of the OpenShift Logging Operator fluentd now uses TCP keep alive by default. (See https://issues.redhat.com/browse/LOG-1186.)

Since this change we see random parsing/decoding errors on the logstash input w/fluent codec, probably because the events are now not cleanly separated.

If we disable the fluentd keep-alive setting, all logs are processed correctly again, however this is not our favoured solution since it would mean disabling the OpenShift Logging Operator entirely and managing the logging settings manually.

Provide logs (if relevant):

Example error messages:

Fluent parse error, original data now in message field {:error=>#<LogStash::Error: Unknown event type>, :data=>[#<LogStash::Codecs::Fluent::EventTime:0x7258c40c @nsec=783023219, @sec=1629275643>,
[...]
Fluent parse error, original data now in message field {:error=>#<TypeError: can't convert nil into an exact number>
[...]
[ERROR][logstash.inputs.tcp      ] <redacted>:37340: closing due: org.logstash.MissingConverterException: Missing Converter handling for full class name=org.jruby.gen.RubyObject3, simple name=RubyObject3

logstash-plugins / logstash-input-tcp

Random decoding errors on TCP input with fluent codec when client uses keepalive #180