kubernetes-retired / kube-aws

[EOL] A command-line tool to declaratively manage Kubernetes clusters on AWS
Apache License 2.0
1.12k stars 295 forks source link

Docker container log limit #134

Closed tarvip closed 7 years ago

tarvip commented 7 years ago

If I am correct then by default there is no limit for docker container log files (/var/lib/docker/containers/<container_id>/<container_id>-json.log) What is the preferred way to limit docker container log files.

At the moment I'm using following drop-in in docker service:

    - name: docker.service
      drop-ins:
        - name: 20-docker-opts.conf
          content: |
            [Service]
            Environment="DOCKER_OPTS=--log-opt max-size=50m"
mumoshu commented 7 years ago

@tarvip Good catch and thanks for sharing your findings! ๐Ÿ‘

I believe nodes created via kube-aws don't have docker log rotation enabled and there's no recommended setup discussed/agreed yet. So let's discuss it here.

Several things:

dzavalkinolx commented 7 years ago

We also need some solution for logrotate / log truncate be implemebted in kube-aws. We have some containers with 5GB log files. Since we relay logs from containers to AWS Elasticsearch there is no need to keep those log files. Sending logs form containers and CoreOS host is a best practice anyway.

Btw, we can share cloudformation template to create AWS Elasticsearch instance and fluentd and curator images if you want to provide whole logging package as a part of kube-aws production-ready project. Currently it can't be considered production ready because of this logrotate issue (and a couple of other issues like no etcd backup).

redbaron commented 7 years ago

@mumoshu not saying that there is no need discuss log handling , but this PR solves immediate problem,so maybe have it merged first?

mumoshu commented 7 years ago

@redbaron Which PR are you wishing to be merged?

mumoshu commented 7 years ago

@dzavalkinolx Thanks for your feedback ๐Ÿ‘

I'd appreciate it if you could share the template for log aggregation stack plus docker images. Everyone reading this issue would benefit from that and I'm even thinking of linking your work from kube-aws doc.

Regarding log rotation, it can be enabled in two ways without modifying kube-aws:

redbaron commented 7 years ago

@mumoshu , oops, I thought I am responding to a PR. I was referring to DOCKER_OPTS solution proposed in first message here - it is quick, reliable and works for most cases.

mumoshu commented 7 years ago

@redbaron Ah, got it!

I'm not yet sure if it is recommended setup for log-rotation as it targets logs from only docker containers but not ones from rkt pods, kubelet, apiserver.

So, how about supporting it via new, more general experimental feature to configure not log-rotation but DOCKER_OPTS, which would be enabled via cluster.yaml like the below?

experimental:
  docker:
    opts: "--log-opt max-size=50m"
redbaron commented 7 years ago

yes, it is not a generic solution, but it covers at least some potential pitfalls. IMHO kube-aws should provide sane defaults. Use of experiments.docker.opts helps and might be used to override defaults though.

mumoshu commented 7 years ago

@redbaron Thanks, it makes sense. So,

Are you ok with:

experimental:
  docker:
    opts: "--log-opt max-size=<kube-aws default>"

Or would you rather like to have:

experimental:
  docker:
    log:
      maxSize: #<kube-aws default>

as default?

Personally I tend to prefer the former because it would keep cluster.yaml spec compact; no need to add various keys like docker.log.maxFile, docker.log.labels, docker.log.env, etc. over time, hence my previous comment.

redbaron commented 7 years ago

Neither. first one ships with experimental feature enabled by default which makes it not an experimental feature :) second one just asking for explosion of all kind of options.

I'd add --log-driver=json --log-opt max-size=50m to docker opts inside a template, which is then completely overwritten by experimental.docker.opts if it is set/enabled.

mumoshu commented 7 years ago

Thanks! Then, how about:

#docker:
# Uncomment and modify to completely override DOCKER_OPTS for docker daemon run in worker and controller nodes. DOCKER_OPTS contains the following for basic log-rotation by default.
#  opts: "--log-driver=json --log-opt max-size=50m"

in cluster.yaml while having the below in e.g cloud-config-worker/controller:

- name: docker.service
      drop-ins:
        - name: 20-docker-opts.conf
          content: |
            [Service]
            Environment="DOCKER_OPTS={{.Docker.Opts}}"
redbaron commented 7 years ago

I like it, what is .Docker.Log.Opts though? Wasn't it .Experimental.Docker.Opts?

[Service]
Environment="DOCKER_OPTS={{ or .Experimental.Docker.Opts "--log-driver=json --log-opt max-size=50m" }}"
mumoshu commented 7 years ago

@redbaron Ah! sorry for the typo in the ref. I meant .Docker.Opts.

Let me also clarify that Experimental is for a feature which can be turned off almost completely when not used; I'm assuming that Docker.Opts is not the one because DOCKER_OPTS(=custom docker opts other than defaults) will be dynamically set regardless of we set docker.opts or not in cluster.yaml after the change.

mumoshu commented 7 years ago

And basically we'd been preferring to code defaults into go code rather than hard-coding those in cloud-config templates.

mumoshu commented 7 years ago

Today, I'm saying no to my original answer in https://github.com/coreos/kube-aws/issues/134#issuecomment-271565227

I generally recommend to use the log rotation feature integrated into docker for rotating docker container logs not to lose your log messages emitted while rotations. I don't recommend logrotate for the purpose anymore.

AFAIK, fluentd for example supports the rotate_wait option to detect a rotation of a log file and keeps tailing the old log file for while, not to lose log messages emitted while the rotation.

A log-rotation must be done outside of fluentd. To allow fluentd to correctly follow both the old and the new log files with the rotate_wait option enabled, a log-rotation must be done like:

// See https://github.com/fluent/fluentd/blob/48526e4/lib/fluent/plugin/in_tail.rb#L508-L531 for how it is implemented in fluentd.

As of today, the only one way to achieve docker container log rotation in that way is to use the log-rotation feature integrated into docker, not to use logrotate.

// See https://github.com/docker/docker/pull/11485/files#diff-1b468ac6508c0a9669b0bae94937b14bR157 for how it is implemented in docker

FYI, using logrotate for rotating docker container logs means you must enable the copytruncate option. Without the option, a docker container completely stops writing log messages after the log file is rotated(moved) because docker has no way to "notice" the inode of the log file is changed. The issue https://github.com/docker/docker/issues/7333 seems to have discussed to allow docker to "notice" the log rotation via a signal(which could be sent from e.g. a lastaction ~ endscript block in a logrotate config) but it had not implemented.

mumoshu commented 7 years ago

Hi @repeatedly, how are you? ๐Ÿ˜„ Sorry for suddenly bothering you but would mind confirming if I'm having correct assumptions regarding fluentd's rotate_wait option?

I assume that a log-rotation must be done like described in the following, to allow fluentd to not drop log messages emitted while a rotation i.e. between an old log file is renamed and a new log file is created?

  • the log-rotator moves(or renames) the log file to another location and then
  • the log-rotator optionally create a new one in the original place
  • the application/container keeps writing to the moved, old log file for a while but eventually reopen to start writing to the new log file located at the original path

Does my assumption seem correct to you?

redbaron commented 7 years ago

Don't know the current state,but not so long ago rotate_wait in fluentd was fundamentally broken: https://github.com/fluent/fluentd/issues/544

if your service manage to rotate logs more than once in rotate_wait period, fluentd looses mind , starts to create duplicates, consumes CPU and be source of problems by itself.

mumoshu commented 7 years ago

@redbaron Thanks for the info! Then at least, we must be sure that maxsize is set to a value large enough ๐Ÿ˜ข

whereisaaron commented 7 years ago

Here was I naively thinking the logging documentation saying that container logs were rotated applied to Kubernetes in general. Turns that it really only applied to the GCE versions, and my clusters have no log rotation :cry:.

After the 'k8s-app' services were containerized, in GCE they still arranged for them to log to /var/log, while the rest of the containers log to /var/log/containers. They currently use logrotate to rotate both sets of logs with the logrotate config below, including the copytruncate option.

https://github.com/kubernetes/kubernetes/blob/master/cluster/gce/gci/configure-helper.sh#L95-L136

{
    rotate 5
    copytruncate
    missingok
    notifempty
    compress
    maxsize 10M
    daily
    dateext
    dateformat -%Y%m%d-%s
    create 0644 root root
}

Now that Docker has logging drivers including built-in support for rotation I agree with @mumoshu that is would desirable to use that, assuming that doesn't break tail watchers like fluentd.

People may want to skip local disk log files entirely and use the Docker logging drivers for Fluentd, Spluck, AWS Cloudwatch, or GCP to stream logs directly to those services (kubectl logs won't work of course, but you can pull the same logs directly from your log service).

I lost the back and forth above. It sounds like fluentd and Docker log rotation are not compatible right now? And so the copying the logrotate config used by other deployments (with copytruncate) is probably the current best option?

mumoshu commented 7 years ago

It sounds like fluentd and Docker log rotation are not compatible right now?

@whereisaaron I'm afraid if I'm following you correctly but yes and no. Probably the most important part is that docker's built-in log-rotation is for the json-log logging driver.

Therefore,

If you by any reason stick with the fluentd logging driver of Docker, yes; the fluend-logging driver for Docker by definition would'nt be used in combination with docker's built-in log-rotation(it doesn't know how the destination fluentd handle log messages hence unable to rotate. docker's built-in log-rotation is intended to be used with the json-log driver, or it is even implemented inside the json-log logging driver If I remember correctly).

If you stick the default json-log logging driver of Docker while enabling the built-in log-rotation as I've suggested, no. It would just work with fluentd and is better than logrotate(copytruncate) + fluentd.

whereisaaron commented 7 years ago

Hi @mumoshu, I was saying people either need a local file rotation system that is compatible with fluentd and similar, or they could use Docker streaming logging drivers and not need to worry about log rotation at all. However, please ignore I said any of that. On reflection I'd rather kube-aws stick close the current k8s core team approach to this, and support local logs with rotation with 'logrotate' rather than pioneer different approaches.

Reading about, using Docker-specific logging and driver options are not well liked for k8s use as it is Docker-specific rather than a k8s abstraction. It also appears that most deployment tools/playbooks are sticking with logrotate until a better cluster-wide and container-agnostic option is decided on. E.g. potentially journald could be used if its performance was improved.

Logrotate with copytruncate appears to still be the current 'best practice' approach as of 1.6.x, including for supporting fluentd. I read discussions about other approaches, but nothing concrete has happened yet AFAIK. As you mentioned, CoreOS already has logrotate installed. kube-aws would just have enable it and drop in the logrotate.d file(s) with a copytruncate config as above. Is there a compelling reason not to do it that way for now?

mumoshu commented 7 years ago

Thanks for the feedback @whereisaaron!

The reasons why I'd rather like not to enable logrotate pre-installed in CoreOS by default are:

  1. It isn't "right" thing to do(no one wants to miss production log messages whenever possible!) according to my theory https://github.com/coreos/kube-aws/issues/134#issuecomment-272994605 and
  2. I prefer to run logrotate as a daemonset so that I have more control over its resource usage and re-configuration, etc.

However, I'm ok with adding an easy way to enable logrotate and/or docker's built-in log-rotation and update documentation/comments in cluster.yaml kind enough, while not enabling them by default, so that everyone won't accidentally miss log-rotation, at least.

whereisaaron commented 7 years ago

Thanks @mumoshu I agree with you, I don't like the copytruncate workaround, it isn't the "right" way to do it. Yet from your and @redbaron and my research it is (currently) the best available option. It is also considerably better than no rotation at all ๐Ÿ˜ƒ

I suggest we just copy the upstream solution for now, then replace it as soon as a better option is available. I don't mind if it is default or not.

redbaron commented 7 years ago

well, my personal view is that --log-opt max-size=50m is the best way to go. copytruncateis terrible

whereisaaron commented 7 years ago

@redbaron if we go with --log-opt max-size=50m will that work for log tail watchers like fluentd? You were saying the rotate_wait was a blocking bug right now? Or does setting --log-opt max-files=1 work around that problem?

I'd be happy with anything that works for both kubectl logs and fluentd.

redbaron commented 7 years ago

@whereisaaron , to be honest I have zero interest in fluentd. If you need to tail logs, rsyslog never failed me

whereisaaron commented 7 years ago

Kubernetes and fluentd are the projects CNCF selected and host for cloud native apps, so there is obviously going to be a lot of focus on using them together. rsyslog doesn't support k8s metadata yet, nor CloudWatch for that matter. There was an effort to re-implement fluentd k8s support as an rsyslog plugin but that seems to have stalled about a year ago.

redbaron commented 7 years ago

really, even if it is blessed by Pope, log forwarder which blows up right into your face simply because app managed to write more data than expected, one which takes same or more CPU time to process log entry than app have spent processing a request , log forwarders like that have no value to me.

But that is not the point. Writing log files and rotate them is a legitimate way to handle logs, whether log tailers can tail logs correctly or not is not the concern of kube-aws or kubernetes.

I agree though, that if kube-aws allow log driver to be configurable, then it would be more flexible.

PS. rsyslog support external modifiers: http://www.rsyslog.com/doc/v8-stable/configuration/modules/mmexternal.html, writing one in Go which augments data with k8s metadata is an easy thing to do

whereisaaron commented 7 years ago

Surely you jest - I can't believe a ruby application would be inefficient ๐Ÿ˜›

whereisaaron commented 7 years ago

FYI @mumoshu @redbarron yesterday the GCI deployment of the core project was switched from logrotate for container logs to uses docker log driver rotation.

They noted the same down sides we noted:

mumoshu commented 7 years ago

Thanks for the info @whereisaaron ๐Ÿ‘

mumoshu commented 7 years ago

Hi Sander, thanks for taking a look into kube-aws!

Although I'm personally relying on fluentd+GCP Stackdriver Loggin for my production cluster, I'm neutral to various distributed logging solutions integrated with kube-aws.

Also, I'm happy to integrate a "standard" distributed logging as long as:

I believe that in your proposed setup, journald and journald-uploader is configured accordingly in every kube-aws node, and fluentd id managed outside of the kube-aws cluster, right? If so, although I'm not an expert in the area but AFAIK, journald still writes to disk until SystemMaxFileSize is met. Is there anyway to disable writing to the disk?

2017ๅนด5ๆœˆ5ๆ—ฅ(้‡‘) 19:17 Sander Schoot Uiterkamp notifications@github.com:

We are investigating if we should move to kube-aws. For logging we are now moving towards journald + journald-uploader to fluentd (+http input) and it would be great if kube-aws supports this.

Downside of file logging is that there is no cluster wide view of the amount of logs that are present and still the risk of spamming a full disk with lots of containers. Journald is needed either way because of the os system logs.

โ€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kubernetes-incubator/kube-aws/issues/134#issuecomment-299430329, or mute the thread https://github.com/notifications/unsubscribe-auth/AABV-efK5RNY1eX3UrEkuTuHIPjejc6aks5r2vdCgaJpZM4LFevs .