Docker stops logging to Fluend after reboot

The Docker Fluentd logging driver by default makes 10 attempts (waiting 1 second after each) to transmit container logs to Fluent before giving up permanently. Apparently, this actually caused Docker to crash back in April 2017 (https://github.com/moby/moby/issues/32567) although that doesn't seem to happen anymore.

Here's another relevant Docker issue: https://github.com/moby/moby/issues/34804

I've seen this problem surface after rebooting a cluster. It must take more than 10 seconds for Docker to start the neon-log-host container after the the Docker daemon launches.

The solution is to configure a much larger number of retries using the fluentd-max-retries option. I'm going to set this to 1 billion which when combined with the 1sec retry interval results in greater than a 11K year reconnect time which is effectively infinity.

I'm also going to set a 5MB RAM buffer limit for queued logs via the fluentd-buffer-limit option to avoid consuming available memory.

nforgeio / neonKUBE

Docker stops logging to Fluend after reboot #126