Closed tarvip closed 7 years ago
@tarvip Good catch and thanks for sharing your findings! ๐
I believe nodes created via kube-aws don't have docker log rotation enabled and there's no recommended setup discussed/agreed yet. So let's discuss it here.
Several things:
We also need some solution for logrotate / log truncate be implemebted in kube-aws. We have some containers with 5GB log files. Since we relay logs from containers to AWS Elasticsearch there is no need to keep those log files. Sending logs form containers and CoreOS host is a best practice anyway.
Btw, we can share cloudformation template to create AWS Elasticsearch instance and fluentd and curator images if you want to provide whole logging package as a part of kube-aws production-ready project. Currently it can't be considered production ready because of this logrotate issue (and a couple of other issues like no etcd backup).
@mumoshu not saying that there is no need discuss log handling , but this PR solves immediate problem,so maybe have it merged first?
@redbaron Which PR are you wishing to be merged?
@dzavalkinolx Thanks for your feedback ๐
I'd appreciate it if you could share the template for log aggregation stack plus docker images. Everyone reading this issue would benefit from that and I'm even thinking of linking your work from kube-aws doc.
Regarding log rotation, it can be enabled in two ways without modifying kube-aws:
@mumoshu , oops, I thought I am responding to a PR. I was referring to DOCKER_OPTS
solution proposed in first message here - it is quick, reliable and works for most cases.
@redbaron Ah, got it!
I'm not yet sure if it is recommended setup for log-rotation as it targets logs from only docker containers but not ones from rkt pods, kubelet, apiserver.
So, how about supporting it via new, more general experimental feature to configure not log-rotation but DOCKER_OPTS
, which would be enabled via cluster.yaml like the below?
experimental:
docker:
opts: "--log-opt max-size=50m"
yes, it is not a generic solution, but it covers at least some potential pitfalls. IMHO kube-aws should provide sane defaults. Use of experiments.docker.opts
helps and might be used to override defaults though.
@redbaron Thanks, it makes sense. So,
Are you ok with:
experimental:
docker:
opts: "--log-opt max-size=<kube-aws default>"
Or would you rather like to have:
experimental:
docker:
log:
maxSize: #<kube-aws default>
as default?
Personally I tend to prefer the former because it would keep cluster.yaml spec compact; no need to add various keys like docker.log.maxFile
, docker.log.labels
, docker.log.env
, etc. over time, hence my previous comment.
Neither. first one ships with experimental feature enabled by default which makes it not an experimental feature :) second one just asking for explosion of all kind of options.
I'd add --log-driver=json --log-opt max-size=50m
to docker opts inside a template, which is then completely overwritten by experimental.docker.opts
if it is set/enabled.
Thanks! Then, how about:
#docker:
# Uncomment and modify to completely override DOCKER_OPTS for docker daemon run in worker and controller nodes. DOCKER_OPTS contains the following for basic log-rotation by default.
# opts: "--log-driver=json --log-opt max-size=50m"
in cluster.yaml while having the below in e.g cloud-config-worker/controller:
- name: docker.service
drop-ins:
- name: 20-docker-opts.conf
content: |
[Service]
Environment="DOCKER_OPTS={{.Docker.Opts}}"
I like it, what is .Docker.Log.Opts
though? Wasn't it .Experimental.Docker.Opts?
[Service]
Environment="DOCKER_OPTS={{ or .Experimental.Docker.Opts "--log-driver=json --log-opt max-size=50m" }}"
@redbaron Ah! sorry for the typo in the ref. I meant .Docker.Opts
.
Let me also clarify that Experimental
is for a feature which can be turned off almost completely when not used; I'm assuming that Docker.Opts
is not the one because DOCKER_OPTS
(=custom docker opts other than defaults) will be dynamically set regardless of we set docker.opts
or not in cluster.yaml
after the change.
And basically we'd been preferring to code defaults into go code rather than hard-coding those in cloud-config
templates.
Today, I'm saying no to my original answer in https://github.com/coreos/kube-aws/issues/134#issuecomment-271565227
I generally recommend to use the log rotation feature integrated into docker for rotating docker container logs not to lose your log messages emitted while rotations. I don't recommend logrotate for the purpose anymore.
AFAIK, fluentd for example supports the rotate_wait
option to detect a rotation of a log file and keeps tailing the old log file for while, not to lose log messages emitted while the rotation.
A log-rotation must be done outside of fluentd.
To allow fluentd to correctly follow both the old and the new log files with the rotate_wait
option enabled, a log-rotation must be done like:
// See https://github.com/fluent/fluentd/blob/48526e4/lib/fluent/plugin/in_tail.rb#L508-L531 for how it is implemented in fluentd.
As of today, the only one way to achieve docker container log rotation in that way is to use the log-rotation feature integrated into docker, not to use logrotate.
// See https://github.com/docker/docker/pull/11485/files#diff-1b468ac6508c0a9669b0bae94937b14bR157 for how it is implemented in docker
FYI, using logrotate for rotating docker container logs means you must enable the copytruncate
option. Without the option, a docker container completely stops writing log messages after the log file is rotated(moved) because docker has no way to "notice" the inode of the log file is changed.
The issue https://github.com/docker/docker/issues/7333 seems to have discussed to allow docker to "notice" the log rotation via a signal(which could be sent from e.g. a lastaction ~ endscript
block in a logrotate config) but it had not implemented.
Hi @repeatedly, how are you? ๐
Sorry for suddenly bothering you but would mind confirming if I'm having correct assumptions regarding fluentd's rotate_wait
option?
I assume that a log-rotation must be done like described in the following, to allow fluentd to not drop log messages emitted while a rotation i.e. between an old log file is renamed and a new log file is created?
- the log-rotator moves(or renames) the log file to another location and then
- the log-rotator optionally create a new one in the original place
- the application/container keeps writing to the moved, old log file for a while but eventually reopen to start writing to the new log file located at the original path
Does my assumption seem correct to you?
Don't know the current state,but not so long ago rotate_wait in fluentd was fundamentally broken: https://github.com/fluent/fluentd/issues/544
if your service manage to rotate logs more than once in rotate_wait period, fluentd looses mind , starts to create duplicates, consumes CPU and be source of problems by itself.
@redbaron Thanks for the info!
Then at least, we must be sure that maxsize
is set to a value large enough ๐ข
Here was I naively thinking the logging documentation saying that container logs were rotated applied to Kubernetes in general. Turns that it really only applied to the GCE versions, and my clusters have no log rotation :cry:.
After the 'k8s-app' services were containerized, in GCE they still arranged for them to log to /var/log
, while the rest of the containers log to /var/log/containers
. They currently use logrotate to rotate both sets of logs with the logrotate config below, including the copytruncate
option.
https://github.com/kubernetes/kubernetes/blob/master/cluster/gce/gci/configure-helper.sh#L95-L136
{
rotate 5
copytruncate
missingok
notifempty
compress
maxsize 10M
daily
dateext
dateformat -%Y%m%d-%s
create 0644 root root
}
Now that Docker has logging drivers including built-in support for rotation I agree with @mumoshu that is would desirable to use that, assuming that doesn't break tail watchers like fluentd.
People may want to skip local disk log files entirely and use the Docker logging drivers for Fluentd, Spluck, AWS Cloudwatch, or GCP to stream logs directly to those services (kubectl logs
won't work of course, but you can pull the same logs directly from your log service).
I lost the back and forth above. It sounds like fluentd and Docker log rotation are not compatible right now? And so the copying the logrotate config used by other deployments (with copytruncate
) is probably the current best option?
It sounds like fluentd and Docker log rotation are not compatible right now?
@whereisaaron I'm afraid if I'm following you correctly but yes and no. Probably the most important part is that docker's built-in log-rotation is for the json-log logging driver.
Therefore,
If you by any reason stick with the fluentd logging driver of Docker, yes; the fluend-logging driver for Docker by definition would'nt be used in combination with docker's built-in log-rotation(it doesn't know how the destination fluentd handle log messages hence unable to rotate. docker's built-in log-rotation is intended to be used with the json-log
driver, or it is even implemented inside the json-log logging driver If I remember correctly).
If you stick the default json-log logging driver of Docker while enabling the built-in log-rotation as I've suggested, no. It would just work with fluentd and is better than logrotate(copytruncate) + fluentd.
Hi @mumoshu, I was saying people either need a local file rotation system that is compatible with fluentd and similar, or they could use Docker streaming logging drivers and not need to worry about log rotation at all. However, please ignore I said any of that. On reflection I'd rather kube-aws
stick close the current k8s core team approach to this, and support local logs with rotation with 'logrotate' rather than pioneer different approaches.
Reading about, using Docker-specific logging and driver options are not well liked for k8s use as it is Docker-specific rather than a k8s abstraction. It also appears that most deployment tools/playbooks are sticking with logrotate until a better cluster-wide and container-agnostic option is decided on. E.g. potentially journald could be used if its performance was improved.
Logrotate with copytruncate
appears to still be the current 'best practice' approach as of 1.6.x, including for supporting fluentd. I read discussions about other approaches, but nothing concrete has happened yet AFAIK. As you mentioned, CoreOS already has logrotate installed. kube-aws
would just have enable it and drop in the logrotate.d file(s) with a copytruncate
config as above. Is there a compelling reason not to do it that way for now?
Thanks for the feedback @whereisaaron!
The reasons why I'd rather like not to enable logrotate pre-installed in CoreOS by default are:
However, I'm ok with adding an easy way to enable logrotate and/or docker's built-in log-rotation and update documentation/comments in cluster.yaml kind enough, while not enabling them by default, so that everyone won't accidentally miss log-rotation, at least.
Thanks @mumoshu I agree with you, I don't like the copytruncate
workaround, it isn't the "right" way to do it. Yet from your and @redbaron and my research it is (currently) the best available option. It is also considerably better than no rotation at all ๐
I suggest we just copy the upstream solution for now, then replace it as soon as a better option is available. I don't mind if it is default or not.
well, my personal view is that --log-opt max-size=50m
is the best way to go. copytruncate
is terrible
@redbaron if we go with --log-opt max-size=50m
will that work for log tail watchers like fluentd
? You were saying the rotate_wait
was a blocking bug right now? Or does setting --log-opt max-files=1
work around that problem?
I'd be happy with anything that works for both kubectl logs
and fluentd
.
@whereisaaron , to be honest I have zero interest in fluentd
. If you need to tail logs, rsyslog
never failed me
Kubernetes and fluentd are the projects CNCF selected and host for cloud native apps, so there is obviously going to be a lot of focus on using them together. rsyslog doesn't support k8s metadata yet, nor CloudWatch for that matter. There was an effort to re-implement fluentd k8s support as an rsyslog plugin but that seems to have stalled about a year ago.
really, even if it is blessed by Pope, log forwarder which blows up right into your face simply because app managed to write more data than expected, one which takes same or more CPU time to process log entry than app have spent processing a request , log forwarders like that have no value to me.
But that is not the point. Writing log files and rotate them is a legitimate way to handle logs, whether log tailers can tail logs correctly or not is not the concern of kube-aws or kubernetes.
I agree though, that if kube-aws allow log driver to be configurable, then it would be more flexible.
PS. rsyslog support external modifiers: http://www.rsyslog.com/doc/v8-stable/configuration/modules/mmexternal.html, writing one in Go which augments data with k8s metadata is an easy thing to do
Surely you jest - I can't believe a ruby application would be inefficient ๐
FYI @mumoshu @redbarron yesterday the GCI deployment of the core project was switched from logrotate
for container logs to uses docker log driver rotation.
They noted the same down sides we noted:
copytruncate
sucksThanks for the info @whereisaaron ๐
Hi Sander, thanks for taking a look into kube-aws!
Although I'm personally relying on fluentd+GCP Stackdriver Loggin for my production cluster, I'm neutral to various distributed logging solutions integrated with kube-aws.
Also, I'm happy to integrate a "standard" distributed logging as long as:
docker logs
and kubectl logs
(AFAIK, docker logging
driver other than json-file
breaks docker logs
and kubectl logs
. Am I
correct?)I believe that in your proposed setup, journald and journald-uploader is configured accordingly in every kube-aws node, and fluentd id managed outside of the kube-aws cluster, right? If so, although I'm not an expert in the area but AFAIK, journald still writes to disk until SystemMaxFileSize is met. Is there anyway to disable writing to the disk?
2017ๅนด5ๆ5ๆฅ(้) 19:17 Sander Schoot Uiterkamp notifications@github.com:
We are investigating if we should move to kube-aws. For logging we are now moving towards journald + journald-uploader to fluentd (+http input) and it would be great if kube-aws supports this.
Downside of file logging is that there is no cluster wide view of the amount of logs that are present and still the risk of spamming a full disk with lots of containers. Journald is needed either way because of the os system logs.
โ You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kubernetes-incubator/kube-aws/issues/134#issuecomment-299430329, or mute the thread https://github.com/notifications/unsubscribe-auth/AABV-efK5RNY1eX3UrEkuTuHIPjejc6aks5r2vdCgaJpZM4LFevs .
If I am correct then by default there is no limit for docker container log files (
/var/lib/docker/containers/<container_id>/<container_id>-json.log
) What is the preferred way to limit docker container log files.At the moment I'm using following drop-in in docker service: