Open LucioFranco opened 4 years ago
Can you expand on the benefits over the current approach and why we would consider this?
Biggest reason I think we should support this is that some third party kube tools default to journald
which would harm the UX of using the vector source.
Reference from gitter:
It varies: Kubeadm says that docker and also kubernetes will default to journald if it is present. Kops sets json-file explicitly unless overridden in the cluster spec
I still think we should suggest json-file for kube which works with our current source but we may want to consider supporting others.
cc @ktff
Source for the quote above:
https://kubernetes.io/docs/concepts/cluster-administration/logging/#system-component-logs
On machines with systemd, the kubelet and container runtime write to journald. If systemd is not present, they write to .log files in the /var/log directory. System components inside containers always write to the /var/log directory, bypassing the default logging mechanism. They use the klog logging library. You can find the conventions for logging severity for those components in the development docs on logging
I have yet to find specifics on what, if any, logging configuration is done for pods in the case of journald.
So to be clear, we are talking about collecting logs only for Kubernetes system components not in containers and third party kube tools that write to journald? Because containers always write to the files, the source for the above is for System components, although it's somewhat badly worded which also makes me somewhat unsure.
Ah you're right, it is just for system components, @rrichardson how does this fit? I was under the assumption you wanted to collect regular containers via journald?
Source for the quote above: https://kubernetes.io/docs/concepts/cluster-administration/logging/#system-component-logs
On machines with systemd, the kubelet and container runtime write to journald. If systemd is not present, they write to .log files in the /var/log directory. System components inside containers always write to the /var/log directory, bypassing the default logging mechanism. They use the klog logging library. You can find the conventions for logging severity for those components in the development docs on logging
I have yet to find specifics on what, if any, logging configuration is done for pods in the case of journald.
Effectively, this is what's going on:
kubelet
and docker
write their own logs to journald
, or /var/log
if there's no journald
;I've explained this in more detail at the k8s integration RFC at https://github.com/timberio/vector/pull/2222 (rendered).
I'm closing this issue since it seems to be ill-posed: k8s doesn't pass Pod
logs to jounald
.
Hello! I know that this issue already closed, but I want you to look again at pod logs in journald. If docker logging driver is set to journald all pod logs go to journald and not in files. I have k8s clusters with this setup and it's very handy to collect all logs from one place. If you don't want to collect logs from journald in kubernetes source maybe you can make trasform that will add kubernetes metadata to logs like in journalbeat? And here is PR in kubernetes that allows read logs from journald.
@MOZGIII I'm curious what you think.
If you need any info from kubernetes cluster with journald as logging driver I can post here.
Interesting! I wasn't aware of that PR being there!
I'm still under the impression that this is a very niche use case. Non-file drivers are now (but really since 2017) supported by the kubectl logs
, but the way it works might introduce a significant overhead (there's a reason this was not a to-go implementation in the first place). I would say we don't want to support this as an out of the box configuration, however we do want to provide tools to make Vector usable in setups like this. People that run those setups generally know what they're doing, and it won't be a hassle to tweak Vector for their needs too.
I think the way we will cover this is by adding a pod metadata annotation transform in addition to the source. This will be usable in a lot of niche cases - the main goal for us being sidecar deployment model support, but we should make it flexible enough to cover this use case as well. One other similar case I intend to support with that transform is sending logs directly to Vector from the Docker daemon over the network, rather than via files, for instance via splunk
log driver.
Does this sound like a viable solution to you?
@enkov I'm curious why was journald
picked? It has some shortcomings with Kubernetes, and I mostly wonder what is the deciding factor there.
Technically, if we accept even higher overhead and load skews, we can use the kubelet
/kube-apiserver
log access interface, however, I have high doubts this would be usable in high-load scenarios. We could explore this as a way to support any kubelet
-supported driver, but I'd put in on hold until we implement other, more promising solutions.
We chose journald as a logging solution because we can collect all logs from one place(system logs and logs from pods). The second reason is log size and log rotation. With journald we automatically get log rotation and binary logs size is pretty small. And I think the simplicity of setup. We just deploy journalbeat as daemon set and that's it. As I wrote I hope vector will be able to add metadata from kubernetes as journalbeat do.
We'll be covering this use case via the kubernetes_annotator transform: https://github.com/timberio/vector/issues/5077
@enkov thanks for your input in https://github.com/timberio/vector/issues/2199#issuecomment-720038172. We agree, if Journald is supported in Kubernetes then Vector should support it as well, but there are a few complexities with supporting this that we discovered in #5317:
kubernetes_logs
source. This method is significantly simpler and faster compared to issuing requests against the Kubernetes metadata API and indexing the response.Needless to say, we have more planning work to do before we can properly support that. We will be putting this project on hold until we can gather more demand and requirements.
If you're a user that needs support for Journald in Kubernetes, please chime in on this issue. Letting us know why you chose journald for k8s would be helpful.
We should consider supporting fetching logs from journald instead of via the file source for the kubernetes transform. This should be somewhat simple in the sense we don't need to change message parsing but just where the source of events come from.