We've had an issue on Google Kubernetes Engine, on a node with
kernel version 4.14.138+, where liveness probes would regularly fail
some percentage of the time.
We've traced the problem down to the poll() system call sometimes
failing in the nc command used in the liveness probe, whereupon
nc returns an empty response, despite the TCP connection from
Zookeeper clearly sending back an imok.
Netcat uses select(), poll(), read(), where poll() sometimes
throws an error because Zookeeper has closed the TCP connection.
Socat uses select(), read(), which works here.
We've had an issue on Google Kubernetes Engine, on a node with kernel version 4.14.138+, where liveness probes would regularly fail some percentage of the time.
We've traced the problem down to the
poll()
system call sometimes failing in thenc
command used in the liveness probe, whereuponnc
returns an empty response, despite the TCP connection from Zookeeper clearly sending back animok
.Netcat uses
select()
,poll()
,read()
, wherepoll()
sometimes throws an error because Zookeeper has closed the TCP connection. Socat usesselect()
,read()
, which works here.