Improve journald support

tallclair commented 3 months ago

/kind feature

Problem

Running Kubelet as a systemd service logging to journald is not ideal today:

Jun 17 12:46:23 gke-test-default-r55g kubelet[118528]: I0617 12:46:23.668491  118528 kubelet.go:2447] "SyncLoop (probe)" probe="liveness" status="unhealthy" pod="default/test"
Jun 17 12:46:24 gke-test-default-r55g kubelet[118528]: E0617 12:46:24.369353  118528 kubelet.go:1875] "Unable to attach or mount volumes for pod; skipping pod" err="unmounted volumes=[data], unattached volumes=[], failed to process volumes=[]: timed out waiting for the condition" pod="default/test"

Logs redundant information with the journal log:
- date / time (0617 12:46:23.668491)
- PID (118528)
Severity level is not parsed by journald

The redundant information can be suppressed with the skip_headers flag, but that also suppresses the severity & code location.

Proposal

The simplest option would be to add a journald option that suppresses the redundant headers, and sets the severity prefix in the journald format (https://www.freedesktop.org/software/systemd/man/latest/sd-daemon.html).

Alternatively, klog could write to journald directly, e.g. something like https://git.frostfs.info/TrueCloudLab/zapjournald, but I don't know enough about the tradeoffs here. I suggest starting with the first option.

pohly commented 3 months ago

With structured, contextual logging, Kubernetes follows the logr model which only has "info" and "error" severity. But I suppose that difference would be nice to represent.

Do you care about writing JSON output to the journald log?

tallclair commented 3 months ago

Do you care about writing JSON output to the journald log?

Good question. I think I'd still expect the redundant information to be suppressed in this case, but I think some follow-up investigation is needed to see how the logs are typically ingested.

dgrisonnet commented 3 months ago

/triage accepted

huww98 commented 2 months ago

Journald itself also stores structured logs (please see journalctl -o json), I think we can map logr fields into journald fields, logr severity to journald priority (to get colorful output with journalctl) if we write to journald socket directly. I can help with this, if you agree. I've merged journald support to ceph before.

But different from logr, journalctl does not show extra fields after message with its default output format. So maybe we should still format the message ourself. Maybe we can also replicate some common fields (or just all the fields) to journald fields for efficient filtering? e.g. how about enabling journalctl -u kubelet POD=default/test or journalctl CODE_FILE=kubelet.go

pohly commented 2 months ago

if we write to journald socket directly

This could be useful, in particular when it ensures that multi-line log entries are treated as one entry. I'm a bit worried that this might happen in text mode with multi-line text values. The text logger does a single os.Stderr.Write, but that does not necessarily mean that journald can read the entire entry with a single read. Does it assume "one line = one entry" or use SOCK_DGRAM?

Maybe we can also replicate some common fields (or just all the fields) to journald fields for efficient filtering?

On caveat is that we only have recommendations for common fields, but no schema and no enforcement of those recommendations other than code reviews. Having said that, some fields like "pod" are used fairly consistently, so exposing that in journald could be useful.

Either way, this changes externally visible behavior of kubelet and potentially other components. We also need to consider how to roll it out. In other words, do we need a KEP? @tallclair, what do you think? Is it worth asking in a SIG Node meeting as this is most useful for kubelet?

huww98 commented 2 months ago

The text logger does a single os.Stderr.Write, but that does not necessarily mean that journald can read the entire entry with a single read. Does it assume "one line = one entry" or use SOCK_DGRAM?

journald has its own protocol. we can write the total length of the message before the message text, so that multi-line log can be recognized as one entry.

pohly commented 2 months ago

Suppose we don't use that protocol. How does systemd capture the stderr output from a process, i.e. what are the message boundaries in that case?

huww98 commented 2 months ago

I think it will just use \n as the separator.

pohly commented 2 months ago

Then we shouldn't remove the klog header while using stderr because the message boundaries could get lost.

tallclair commented 1 month ago

Yes, I can follow up with with sig-node.

kubernetes / klog

Improve journald support #411

Problem

Proposal