nforgeio / neonKUBE

Public NeonKUBE Kubernetes distribution related projects
https://neonkube.io
Apache License 2.0
78 stars 13 forks source link

Docker stops logging to Fluend after reboot #126

Closed jefflill closed 7 years ago

jefflill commented 7 years ago

The Docker Fluentd logging driver by default makes 10 attempts (waiting 1 second after each) to transmit container logs to Fluent before giving up permanently. Apparently, this actually caused Docker to crash back in April 2017 (https://github.com/moby/moby/issues/32567) although that doesn't seem to happen anymore.

Here's another relevant Docker issue: https://github.com/moby/moby/issues/34804

I've seen this problem surface after rebooting a cluster. It must take more than 10 seconds for Docker to start the neon-log-host container after the the Docker daemon launches.

The solution is to configure a much larger number of retries using the fluentd-max-retries option. I'm going to set this to 1 billion which when combined with the 1sec retry interval results in greater than a 11K year reconnect time which is effectively infinity.

I'm also going to set a 5MB RAM buffer limit for queued logs via the fluentd-buffer-limit option to avoid consuming available memory.

jefflill commented 7 years ago

It looks like the Docker Fluentd log options wasn't the only problem here. I also have to restart the neon-log-host containers to get the events flowing again after a cluster restart.