mesosphere / dcos-kubernetes-quickstart

Quickstart guide for Kubernetes on DC/OS
https://mesosphere.com/kubernetes/
Apache License 2.0
167 stars 54 forks source link

Fluentd not able to tail logs #72

Closed bjornm82 closed 5 years ago

bjornm82 commented 6 years ago

Create new cluster: Mesosphere DC/OS Version 1.11.0

Apply Kubernetes cloud provider support (don't think there is relevance though): https://aws.amazon.com/blogs/opensource/cloud-provider-support-kubernetes-dcos/ Kubernetes version v1.9.4

Starting Fluentd daemonset: https://github.com/fluent/fluentd-kubernetes-daemonset/blob/master/fluentd-daemonset-elasticsearch.yaml

Logs fluentd pod output:

2018-03-16 19:26:34 +0000 [warn]: #0 'type' is deprecated parameter name. use '@type' instead.
2018-03-16 19:26:34 +0000 [info]: adding source type="tail"
2018-03-16 19:26:34 +0000 [warn]: #0 'type' is deprecated parameter name. use '@type' instead.
2018-03-16 19:26:34 +0000 [info]: adding source type="tail"
2018-03-16 19:26:34 +0000 [warn]: #0 'type' is deprecated parameter name. use '@type' instead.
2018-03-16 19:26:34 +0000 [info]: adding source type="tail"
2018-03-16 19:26:34 +0000 [warn]: #0 'type' is deprecated parameter name. use '@type' instead.
2018-03-16 19:26:34 +0000 [info]: adding source type="tail"
2018-03-16 19:26:34 +0000 [warn]: #0 'type' is deprecated parameter name. use '@type' instead.
2018-03-16 19:26:34 +0000 [info]: adding source type="tail"
2018-03-16 19:26:34 +0000 [warn]: #0 'type' is deprecated parameter name. use '@type' instead.
2018-03-16 19:26:34 +0000 [info]: adding source type="tail"
2018-03-16 19:26:34 +0000 [warn]: #0 'type' is deprecated parameter name. use '@type' instead.
2018-03-16 19:26:34 +0000 [info]: adding source type="tail"
2018-03-16 19:26:34 +0000 [warn]: #0 'type' is deprecated parameter name. use '@type' instead.
2018-03-16 19:26:34 +0000 [info]: adding source type="tail"
2018-03-16 19:26:34 +0000 [warn]: #0 'type' is deprecated parameter name. use '@type' instead.
2018-03-16 19:26:34 +0000 [info]: adding source type="tail"
2018-03-16 19:26:34 +0000 [warn]: #0 'type' is deprecated parameter name. use '@type' instead.
2018-03-16 19:26:34 +0000 [info]: adding source type="tail"
2018-03-16 19:26:34 +0000 [warn]: #0 'type' is deprecated parameter name. use '@type' instead.
2018-03-16 19:26:34 +0000 [info]: adding source type="tail"
2018-03-16 19:26:34 +0000 [info]: #0 starting fluentd worker pid=15 ppid=1 worker=0
2018-03-16 19:26:34 +0000 [warn]: #0 /var/log/containers/kube-dns-754f9cd4f5-pnfnk_kube-system_dnsmasq-e75753d29a5e4879f712525b268575a6c9fbaf55a17d677037980d0e2bd75495.log unreadable. It is excluded and would be examined next time.
2018-03-16 19:26:34 +0000 [warn]: #0 /var/log/containers/metrics-server-54974fd587-cdjc7_kube-system_metrics-server-270577277bc1213532129c467f07c6bf6413c7f25b7d219717e9422cc437117d.log unreadable. It is excluded and would be examined next time.
2018-03-16 19:26:34 +0000 [warn]: #0 /var/log/containers/fluentd-cxkfr_kube-system_fluentd-2b138775674b9b090d1f2492bec0732346a89bf8b2a3758c6fd4d8d9abcb19a9.log unreadable. It is excluded and would be examined next time.
2018-03-16 19:26:34 +0000 [warn]: #0 /var/log/containers/kubernetes-dashboard-5cfddd7d5b-cj8c7_kube-system_kubernetes-dashboard-171e2225ecb06d685a38cd61fe04f380cabf91c4bea77c4739117155a3dbe674.log unreadable. It is excluded and would be examined next time.
2018-03-16 19:26:34 +0000 [warn]: #0 /var/log/containers/kube-dns-754f9cd4f5-pnfnk_kube-system_kubedns-eb34aa08266b0e7b317fb7e21f1857ac727ee98354559d291ec425d08601d3ad.log unreadable. It is excluded and would be examined next time.
2018-03-16 19:26:34 +0000 [warn]: #0 /var/log/containers/kube-dns-754f9cd4f5-pnfnk_kube-system_sidecar-c6c1709f49fe1c16630531ed199a027a1b48fab093ead6f339015d6936b6aec2.log unreadable. It is excluded and would be examined next time.
2018-03-16 19:26:34 +0000 [info]: #0 fluentd worker is now running worker=0

As earlier today the logs ended by a non able to follow symlink. Seems like sort of the same issue as given at https://github.com/kubernetes/kubernetes/issues/39225, however the thread is rather old.

bjornm82 commented 6 years ago

Some additional information since yesterday:

When sshing into the master node:

Last login: Sat Mar 17 07:08:21 UTC 2018 from 10.0.6.72 on pts/0
Container Linux by CoreOS stable (1235.12.0)
Update Strategy: No Reboots
Failed Units: 2
  format-var-lib-ephemeral.service
  update-engine.service

$ systemctl status format-var-lib-ephemeral.service

● format-var-lib-ephemeral.service - AWS Setup: Formats the /var/lib ephemeral drive Loaded: loaded (/etc/systemd/system/format-var-lib-ephemeral.service; static; vendor preset: disabled) Active: failed (Result: exit-code) since Sat 2018-03-17 07:45:10 UTC; 16s ago Process: 28019 ExecStart=/bin/bash -c (blkid -t TYPE=ext4 | grep xvdb) || (/usr/sbin/mkfs.ext4 -F /dev/xvdb) (code=exited, status=1/F Main PID: 28019 (code=exited, status=1/FAILURE)

Mar 17 07:45:10 ip-10-0-0-206.us-west-2.compute.internal systemd[1]: Starting AWS Setup: Formats the /var/lib ephemeral drive... Mar 17 07:45:10 ip-10-0-0-206.us-west-2.compute.internal bash[28019]: mke2fs 1.42.13 (17-May-2015) Mar 17 07:45:10 ip-10-0-0-206.us-west-2.compute.internal bash[28019]: The file /dev/xvdb does not exist and no size was specified. Mar 17 07:45:10 ip-10-0-0-206.us-west-2.compute.internal systemd[1]: format-var-lib-ephemeral.service: Main process exited, code=exited Mar 17 07:45:10 ip-10-0-0-206.us-west-2.compute.internal systemd[1]: Failed to start AWS Setup: Formats the /var/lib ephemeral drive. Mar 17 07:45:10 ip-10-0-0-206.us-west-2.compute.internal systemd[1]: format-var-lib-ephemeral.service: Unit entered failed state. Mar 17 07:45:10 ip-10-0-0-206.us-west-2.compute.internal systemd[1]: format-var-lib-ephemeral.service: Failed with result 'exit-code'.


Searching for all .log files

> find / -name *.log -ls

Returns a list where the logs should be located, but the location only contains non-existing symlinks:
cd /var/lib/mesos/slave/volumes/roles/kubernetes-role/82e9efc0-aeb3-457a-8174-57490b4615c1/new/log/containers/

> ls -alh
<img width="1430" alt="screen shot 2018-03-17 at 8 54 40 am" src="https://user-images.githubusercontent.com/8833427/37553079-e9810df4-29c0-11e8-9774-5b04b06c1d05.png">

Does the above mean that /var/lib should be created by AWS directed by Kubernetes? So it might be in the setup and Kubernetes permissions?
pires commented 6 years ago

This is a limitation on our end and, at the moment, there's no quick, safe resolution we can think of. We will come back to this in the future.

pires commented 6 years ago

@bjornm82 this is been worked on as we speak. We expect it to be released with DC/OS 1.12.

bjornm82 commented 6 years ago

Thanks @pires !

blublinsky commented 6 years ago

There are several problems here. THe most fundamental one is how to configure fluentd to access kubernetes logs. On the straight kubernetes the configuration looks as follows: volumes:

pires commented 6 years ago

@blublinsky

What does straight kubernetes mean?

Anyway, regarding a planned resolution, yes, as mentioned above it's being worked on and we expect to release it as part of DC/OS 1.12.

blublinsky commented 6 years ago

I meant native kubernetes deployed on bare metal

pires commented 6 years ago

As mentioned before, this will be released as part of our DC/OS 1.12 release and Kubernetes package 2.x.

hectorj2f commented 5 years ago

this issue should be fixed in 2.0.0-1.12.1 https://docs.mesosphere.com/services/kubernetes/2.0.0-1.12.1/