vmware / kube-fluentd-operator

Auto-configuration of Fluentd daemon-set based on Kubernetes metadata
Other
319 stars 102 forks source link

journald logs from systemd missing #141

Closed paulbarbu closed 3 years ago

paulbarbu commented 4 years ago

Hello, I installed the operator and I can see my application logs inside the cluster. However I cannot see the journal logs (kubelet, etc) inside the cluster. I'm using the EFK stack (elastic search, fluentd, kibana).

Inside the fluentd container I have access to the /var/log/journal directory (drwxr-sr-x 3 root 101 4096 May 21 12:37 journal). There are no noticeable messages in the fluentd's logs. Where should I look next to get the journald logs in my EFK stack?

ccbur commented 3 years ago

Same here. We are evaluating kube-fluentd-operator for an AKS installation, but we are not able to figure out how to get something from systemd. The nodes are based on Ubuntu 18.04.5 LTS and we see the journal files mounted to the daemon set, but nothing special in fluent logs:

# kubectl logs -n logging logging-log-router-4m9lx -c fluentd | grep -E "systemd|journal"
2021-03-18 07:40:18 +0000 [info]: gem 'fluent-plugin-systemd' version '1.0.2'
2021-03-18 07:40:19 +0000 [info]: adding rewrite_tag_filter rule: unit [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x0000000003a23548 @keys="unit">, /^(.+)$/, "", "systemd.$1"]
    @type systemd
    @id in_systemd_docker
    path "/var/log/journal"
    tag "systemd.unit"
      path "/var/log/logging-log-router-fluentd-journald-cursor.json"
  <match systemd.unit>
      tag "systemd.$1"
  <filter systemd.kubelet.service>
2021-03-18 07:40:22 +0000 [info]: adding match pattern="systemd.unit" type="rewrite_tag_filter"
2021-03-18 07:40:22 +0000 [info]: #0 adding rewrite_tag_filter rule: unit [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x000000000448a158 @keys="unit">, /^(.+)$/, "", "systemd.$1"]
2021-03-18 07:40:22 +0000 [info]: adding filter pattern="systemd.kubelet.service" type="parser"
2021-03-18 07:40:24 +0000 [info]: adding source type="systemd"
# kubectl exec -it -n logging logging-log-router-4m9lx -c fluentd -- ls -lR /var/log/journal
/var/log/journal:
total 8
drwxr-sr-x 2 root 101 4096 Mar 17 23:45 405c877943554aec979deee6c8473be1
drwxr-sr-x 2 root 101 4096 Dec 16 00:11 794301a5a9f54c468ceefd980ca82704

/var/log/journal/405c877943554aec979deee6c8473be1:
total 942112
-rw-r----- 1 root 101  25165824 Mar 18 07:51 system.journal
-rw-r----- 1 root 101 134217728 Mar  1 04:26 system@c4ebc8ed66a74cbf9445527217313a1d-00000000001fa95c-0005bc37aeaf4b7f.journal
-rw-r----- 1 root 101 134217728 Mar  4 00:53 system@c4ebc8ed66a74cbf9445527217313a1d-000000000021c719-0005bc7204a0df08.journal
-rw-r----- 1 root 101 134217728 Mar  6 21:12 system@c4ebc8ed66a74cbf9445527217313a1d-000000000023e305-0005bcab6631997b.journal
-rw-r----- 1 root 101 134217728 Mar  9 14:01 system@c4ebc8ed66a74cbf9445527217313a1d-0000000000260088-0005bce4a95c016e.journal
-rw-r----- 1 root 101 134217728 Mar 12 09:51 system@c4ebc8ed66a74cbf9445527217313a1d-000000000028176d-0005bd1afce84b73.journal
-rw-r----- 1 root 101 134217728 Mar 15 05:12 system@c4ebc8ed66a74cbf9445527217313a1d-00000000002a3199-0005bd53d85de39f.journal
-rw-r----- 1 root 101 134217728 Mar 17 23:45 system@c4ebc8ed66a74cbf9445527217313a1d-00000000002c4b39-0005bd8c4a52408c.journal

/var/log/journal/794301a5a9f54c468ceefd980ca82704:
total 84236
-rw-r----- 1 root 101 16777216 Dec 16 00:25 system.journal
-rw-r----- 1 root 101 26345472 Dec 16 00:11 system@d09efdd4a33442d7ad79cc8601e7b060-0000000000000001-0005b689a566ef69.journal
-rw-r----- 1 root 101 26345472 Dec 16 00:11 system@d09efdd4a33442d7ad79cc8601e7b060-0000000000006f1d-0005b689b7645c2f.journal
-rw-r----- 1 root 101  8388608 Dec 16 00:25 user-1000.journal
-rw-r----- 1 root 101  8388608 Dec 16 00:11 user-1000@f8eddf04badd452b9787aaf8eed3b278-00000000000003d7-0005b689a68cdb13.journal
# kubectl exec -it -n logging logging-log-router-4m9lx -c fluentd -- cat /var/log/logging-log-router-fluentd-journald-cursor.json
{}

Any idea?

paulbarbu commented 3 years ago

@ccbur For the short testing session we did, I ended up switching to https://github.com/fluent/fluentd-kubernetes-daemonset And I think I also got the journald log to show up if journald was writing to /var/log/journal, still the memory is foggy, but this much I can gather from my notes back then.

ccbur commented 3 years ago

I looked also into https://github.com/fluent/fluentd-kubernetes-daemonset and https://github.com/fluent-plugin-systemd/fluent-plugin-systemd and what issues there are regarding systemd, but it's more or less all around user permissions and systemd versions. As far as I can tell those two issues should not be the case with kube-fluentd-operator as it's already starting with root permissions (which I don't like) and our nodes are using the LTS with an older version of systemd (237 I think) which should be compatible. :-(

Cryptophobia commented 3 years ago

Systemd version is 239 in photon:3.0 which is the base image that we use inside the container that is running fluentd. Could this be the problem?

systemctl --version
systemd 239

@ccbur, can you verify the version of systemd inside the log-router fluentd container matches the above version. Also could you upgrade to systemd version 239 on the nodes so we can isolate that is not causing the issues?

I can't find any open issues related to photon, systemd, journald so I don't believe we are hitting anything like that. Linked library is installed on the photon base image:

ldconfig -p | grep libsystemd
    libsystemd.so.0 (libc6,x86-64) => /lib/libsystemd.so.0
    libsystemd.so (libc6,x86-64) => /lib/libsystemd.so

On photon:4.0 the latest systemd is:

systemctl --version
systemd 247 (v247.3-1.ph4)
+PAM -AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP -LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 -IDN -PCRE2 default-hierarchy=hybrid

We will probably move to photon:4.0 soon. Why did you say that systemd version matters? Are there any open issues we should be aware of @ccbur, beside this one https://github.com/fluent-plugin-systemd/fluent-plugin-systemd/issues/87?

ccbur commented 3 years ago

I had https://github.com/fluent-plugin-systemd/fluent-plugin-systemd/issues/87 in my mind regarding systemd incompatibilities, which does not seem to be the case here. Yes, our fluentd container is based on 239:

# kubectl exec -it -n logging logging-log-router-6zvms -c fluentd -- systemctl --version
systemd 239
+PAM -AUDIT -SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP -LIBCRYPTSETUP +GCRYPT -GNUTLS -ACL +XZ -LZ4 -SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 -IDN -PCRE2 default-hierarchy=hybrid

Currently I don't have access to the nodes, so I can only relate to the Ubuntu LTS version which should be based on Systemd 237. Sorry, an upgrade is currently not possible.

But I tried to use journalctl directly and got a lot of errors about missing lz4 compression when using the debug log level:

# kubectl exec -it -n logging logging-log-router-6zvms -c fluentd -- sh -c 'SYSTEMD_LOG_LEVEL=debug journalctl -D /var/log/journal'
Considering root directory '/var/log/journal'.
Root directory /var/log/journal added.
Considering directory '/var/log/journal/d53103135086498b844ed77884083388'.
Directory /var/log/journal/d53103135086498b844ed77884083388 added.
Journal effective settings seal=no compress=no compress_threshold_bytes=8B
Journal file /var/log/journal/d53103135086498b844ed77884083388/system@060d8c1345de42588d15d12f6bece8a7-0000000000000001-0005bdd9c27b98d3.journal uses incompatible flag lz4-compressed disabled at compilation time.
Failed to open journal file /var/log/journal/d53103135086498b844ed77884083388/system@060d8c1345de42588d15d12f6bece8a7-0000000000000001-0005bdd9c27b98d3.journal: Protocol not supported
Journal effective settings seal=no compress=no compress_threshold_bytes=8B
Journal file /var/log/journal/d53103135086498b844ed77884083388/system@4e16a584ed354d0f918bb38b61976b63-0000000000a88368-0005bda6d909a9f4.journal uses incompatible flag lz4-compressed disabled at compilation time.
Failed to open journal file 
...
Failed to open journal file /var/log/journal/794301a5a9f54c468ceefd980ca82704/user-1000.journal: Protocol not supported
Journal file /var/log/journal/d53103135086498b844ed77884083388/system@060d8c1345de42588d15d12f6bece8a7-0000000000000001-0005bdd9c27b98d3.journal uses an unsupported feature, ignoring file.
Use SYSTEMD_LOG_LEVEL=debug journalctl --file=/var/log/journal/d53103135086498b844ed77884083388/system@060d8c1345de42588d15d12f6bece8a7-0000000000000001-0005bdd9c27b98d3.journal to see the details.
Journal filter: none
Successfully forked off '(pager)' as PID 6157.
PR_SET_MM_ARG_START failed, proceeding without: Operation not permitted
-- No entries --
Directory /var/log/journal/d53103135086498b844ed77884083388 removed.
Directory /var/log/journal/794301a5a9f54c468ceefd980ca82704 removed.
Root directory /var/log/journal removed.
mmap cache statistics: 0 hit, 16 miss

I'm not familiar with Photon OS, but maybe it's the same missing lz4 dependency like in this Homebrew issue: https://github.com/Homebrew/linuxbrew-core/issues/19506

I looked into https://github.com/vmware/photon/blob/3.0/SPECS/systemd/systemd.spec and there is no

Requires:         lz4

Is it possible to upgrade the systemd package in photon:3.0 to the same package like in photon:4.0?

Cryptophobia commented 3 years ago

lz4 is probably the issue here then. Compression is enabled on the nodes and when photon's systemd does not have the compression library it cannot read the files...

We will be upgrading to photon:4.0 for next release: https://github.com/vmware/kube-fluentd-operator/issues/184

Cryptophobia commented 3 years ago

PR is already made and tested that it builds: https://github.com/vmware/kube-fluentd-operator/pull/186

L4Z flag is enabled on systemd 247 in photon:4.0

This will be included the next release.