Support `journald` as a Kubernetes log source

LucioFranco commented 4 years ago

We should consider supporting fetching logs from journald instead of via the file source for the kubernetes transform. This should be somewhat simple in the sense we don't need to change message parsing but just where the source of events come from.

binarylogic commented 4 years ago

Can you expand on the benefits over the current approach and why we would consider this?

LucioFranco commented 4 years ago

Biggest reason I think we should support this is that some third party kube tools default to journald which would harm the UX of using the vector source.

Reference from gitter:

It varies: Kubeadm says that docker and also kubernetes will default to journald if it is present. Kops sets json-file explicitly unless overridden in the cluster spec

I still think we should suggest json-file for kube which works with our current source but we may want to consider supporting others.

cc @ktff

rrichardson commented 4 years ago

Source for the quote above:
https://kubernetes.io/docs/concepts/cluster-administration/logging/#system-component-logs

On machines with systemd, the kubelet and container runtime write to journald. If systemd is not present, they write to .log files in the /var/log directory. System components inside containers always write to the /var/log directory, bypassing the default logging mechanism. They use the klog logging library. You can find the conventions for logging severity for those components in the development docs on logging

I have yet to find specifics on what, if any, logging configuration is done for pods in the case of journald.

ktff commented 4 years ago

So to be clear, we are talking about collecting logs only for Kubernetes system components not in containers and third party kube tools that write to journald? Because containers always write to the files, the source for the above is for System components, although it's somewhat badly worded which also makes me somewhat unsure.

LucioFranco commented 4 years ago

Ah you're right, it is just for system components, @rrichardson how does this fit? I was under the assumption you wanted to collect regular containers via journald?

MOZGIII commented 4 years ago

Source for the quote above: https://kubernetes.io/docs/concepts/cluster-administration/logging/#system-component-logs
On machines with systemd, the kubelet and container runtime write to journald. If systemd is not present, they write to .log files in the /var/log directory. System components inside containers always write to the /var/log directory, bypassing the default logging mechanism. They use the klog logging library. You can find the conventions for logging severity for those components in the development docs on logging
I have yet to find specifics on what, if any, logging configuration is done for pods in the case of journald.

Effectively, this is what's going on:

kubelet and docker write their own logs to journald, or /var/log if there's no journald;
containers write logs according to the Kubernetes logging interface, to the files on disk.

I've explained this in more detail at the k8s integration RFC at https://github.com/timberio/vector/pull/2222 (rendered).

I'm closing this issue since it seems to be ill-posed: k8s doesn't pass Pod logs to jounald.

enkov commented 4 years ago

Hello! I know that this issue already closed, but I want you to look again at pod logs in journald. If docker logging driver is set to journald all pod logs go to journald and not in files. I have k8s clusters with this setup and it's very handy to collect all logs from one place. If you don't want to collect logs from journald in kubernetes source maybe you can make trasform that will add kubernetes metadata to logs like in journalbeat? And here is PR in kubernetes that allows read logs from journald.

binarylogic commented 4 years ago

@MOZGIII I'm curious what you think.

enkov commented 4 years ago

If you need any info from kubernetes cluster with journald as logging driver I can post here.

MOZGIII commented 4 years ago

Interesting! I wasn't aware of that PR being there!

I'm still under the impression that this is a very niche use case. Non-file drivers are now (but really since 2017) supported by the kubectl logs, but the way it works might introduce a significant overhead (there's a reason this was not a to-go implementation in the first place). I would say we don't want to support this as an out of the box configuration, however we do want to provide tools to make Vector usable in setups like this. People that run those setups generally know what they're doing, and it won't be a hassle to tweak Vector for their needs too.

I think the way we will cover this is by adding a pod metadata annotation transform in addition to the source. This will be usable in a lot of niche cases - the main goal for us being sidecar deployment model support, but we should make it flexible enough to cover this use case as well. One other similar case I intend to support with that transform is sending logs directly to Vector from the Docker daemon over the network, rather than via files, for instance via splunk log driver.

Does this sound like a viable solution to you?

@enkov I'm curious why was journald picked? It has some shortcomings with Kubernetes, and I mostly wonder what is the deciding factor there.

Technically, if we accept even higher overhead and load skews, we can use the kubelet/kube-apiserver log access interface, however, I have high doubts this would be usable in high-load scenarios. We could explore this as a way to support any kubelet-supported driver, but I'd put in on hold until we implement other, more promising solutions.

enkov commented 4 years ago

We chose journald as a logging solution because we can collect all logs from one place(system logs and logs from pods). The second reason is log size and log rotation. With journald we automatically get log rotation and binary logs size is pretty small. And I think the simplicity of setup. We just deploy journalbeat as daemon set and that's it. As I wrote I hope vector will be able to add metadata from kubernetes as journalbeat do.

MOZGIII commented 4 years ago

We'll be covering this use case via the kubernetes_annotator transform: https://github.com/timberio/vector/issues/5077

binarylogic commented 3 years ago

@enkov thanks for your input in https://github.com/timberio/vector/issues/2199#issuecomment-720038172. We agree, if Journald is supported in Kubernetes then Vector should support it as well, but there are a few complexities with supporting this that we discovered in #5317:

We'll need a separate transform that can enrich log data regardless of the source. For context, we currently don't have this because all of the relevant metadata is currently extracted from log file paths in the kubernetes_logs source. This method is significantly simpler and faster compared to issuing requests against the Kubernetes metadata API and indexing the response.
Introducing a transform for enrichment introduces two separate code paths for this functionality, something we want to avoid. Before we proceed, we will need to agree on how to consolidate this.
The k8s metadata can be very large, unique, and complex, which creates questions around data selection, cleaning, and so on and how to enable users to achieve this.

Needless to say, we have more planning work to do before we can properly support that. We will be putting this project on hold until we can gather more demand and requirements.

If you're a user that needs support for Journald in Kubernetes, please chime in on this issue. Letting us know why you chose journald for k8s would be helpful.

vectordotdev / vector

Support `journald` as a Kubernetes log source #2199