Closed bjornm82 closed 5 years ago
Some additional information since yesterday:
When sshing into the master node:
Last login: Sat Mar 17 07:08:21 UTC 2018 from 10.0.6.72 on pts/0
Container Linux by CoreOS stable (1235.12.0)
Update Strategy: No Reboots
Failed Units: 2
format-var-lib-ephemeral.service
update-engine.service
$ systemctl status format-var-lib-ephemeral.service
● format-var-lib-ephemeral.service - AWS Setup: Formats the /var/lib ephemeral drive Loaded: loaded (/etc/systemd/system/format-var-lib-ephemeral.service; static; vendor preset: disabled) Active: failed (Result: exit-code) since Sat 2018-03-17 07:45:10 UTC; 16s ago Process: 28019 ExecStart=/bin/bash -c (blkid -t TYPE=ext4 | grep xvdb) || (/usr/sbin/mkfs.ext4 -F /dev/xvdb) (code=exited, status=1/F Main PID: 28019 (code=exited, status=1/FAILURE)
Mar 17 07:45:10 ip-10-0-0-206.us-west-2.compute.internal systemd[1]: Starting AWS Setup: Formats the /var/lib ephemeral drive... Mar 17 07:45:10 ip-10-0-0-206.us-west-2.compute.internal bash[28019]: mke2fs 1.42.13 (17-May-2015) Mar 17 07:45:10 ip-10-0-0-206.us-west-2.compute.internal bash[28019]: The file /dev/xvdb does not exist and no size was specified. Mar 17 07:45:10 ip-10-0-0-206.us-west-2.compute.internal systemd[1]: format-var-lib-ephemeral.service: Main process exited, code=exited Mar 17 07:45:10 ip-10-0-0-206.us-west-2.compute.internal systemd[1]: Failed to start AWS Setup: Formats the /var/lib ephemeral drive. Mar 17 07:45:10 ip-10-0-0-206.us-west-2.compute.internal systemd[1]: format-var-lib-ephemeral.service: Unit entered failed state. Mar 17 07:45:10 ip-10-0-0-206.us-west-2.compute.internal systemd[1]: format-var-lib-ephemeral.service: Failed with result 'exit-code'.
Searching for all .log files
> find / -name *.log -ls
Returns a list where the logs should be located, but the location only contains non-existing symlinks:
cd /var/lib/mesos/slave/volumes/roles/kubernetes-role/82e9efc0-aeb3-457a-8174-57490b4615c1/new/log/containers/
> ls -alh
<img width="1430" alt="screen shot 2018-03-17 at 8 54 40 am" src="https://user-images.githubusercontent.com/8833427/37553079-e9810df4-29c0-11e8-9774-5b04b06c1d05.png">
Does the above mean that /var/lib should be created by AWS directed by Kubernetes? So it might be in the setup and Kubernetes permissions?
This is a limitation on our end and, at the moment, there's no quick, safe resolution we can think of. We will come back to this in the future.
@bjornm82 this is been worked on as we speak. We expect it to be released with DC/OS 1.12.
Thanks @pires !
There are several problems here. THe most fundamental one is how to configure fluentd to access kubernetes logs. On the straight kubernetes the configuration looks as follows: volumes:
@blublinsky
What does straight kubernetes mean?
Anyway, regarding a planned resolution, yes, as mentioned above it's being worked on and we expect to release it as part of DC/OS 1.12.
I meant native kubernetes deployed on bare metal
As mentioned before, this will be released as part of our DC/OS 1.12 release and Kubernetes package 2.x.
this issue should be fixed in 2.0.0-1.12.1
https://docs.mesosphere.com/services/kubernetes/2.0.0-1.12.1/
Create new cluster: Mesosphere DC/OS Version 1.11.0
Apply Kubernetes cloud provider support (don't think there is relevance though): https://aws.amazon.com/blogs/opensource/cloud-provider-support-kubernetes-dcos/ Kubernetes version v1.9.4
Starting Fluentd daemonset: https://github.com/fluent/fluentd-kubernetes-daemonset/blob/master/fluentd-daemonset-elasticsearch.yaml
Logs fluentd pod output:
As earlier today the logs ended by a non able to follow symlink. Seems like sort of the same issue as given at https://github.com/kubernetes/kubernetes/issues/39225, however the thread is rather old.