hypriot / cluster-lab

Hypriot Cluster Lab
http://blog.hypriot.com
MIT License
156 stars 18 forks source link

sed command failing with vagrant and Docker 1.11 #46

Open Govinda-Fichtner opened 8 years ago

Govinda-Fichtner commented 8 years ago

While starting the cluster-lab with a vagrant up after vagrant destroy I get the following log output:

==> follower2: Setting up hypriot-cluster-lab-src (0.2.12-1) ...
==> follower2: Created symlink from /etc/systemd/system/multi-user.target.wants/cluster-lab.service to /etc/systemd/system/cluster-lab.service.
==> follower2: cp:
==> follower2: cannot stat ‘/etc/systemd/system/docker.service’
==> follower2: : No such file or directory
==> follower2: sed: can't read /etc/systemd/system/docker.service: No such file or directory

A docker info against Swarm results in the following output:

root@follower1:/home/vagrant# DOCKER_HOST=tcp://192.168.200.1:2378 docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 0
Role: primary
Strategy: spread
Filters: health, port, dependency, affinity, constraint
Nodes: 3
 (unknown): 192.168.200.45:2375
  └ Status: Pending
  └ Containers: 0
  └ Reserved CPUs: 0 / 0
  └ Reserved Memory: 0 B / 0 B
  └ Labels:
  └ Error: Cannot connect to the docker engine endpoint
  └ UpdatedAt: 2016-06-08T05:03:41Z
 (unknown): 192.168.200.1:2375
  └ Status: Pending
  └ Containers: 0
  └ Reserved CPUs: 0 / 0
  └ Reserved Memory: 0 B / 0 B
  └ Labels:
  └ Error: Cannot connect to the docker engine endpoint
  └ UpdatedAt: 2016-06-08T04:58:31Z
 (unknown): 192.168.200.26:2375
  └ Status: Pending
  └ Containers: 0
  └ Reserved CPUs: 0 / 0
  └ Reserved Memory: 0 B / 0 B
  └ Labels:
  └ Error: Cannot connect to the docker engine endpoint
  └ UpdatedAt: 2016-06-08T05:01:01Z
Plugins:
 Volume:
 Network:
Kernel Version: 4.2.0-30-generic
Operating System: linux
Architecture: amd64
CPUs: 0
Total Memory: 0 B
Name: e62a0f42529d
Docker Root Dir:
Debug mode (client): false
Debug mode (server): false
WARNING: No kernel memory limit support

A docker info against the local Docker installation results in

root@follower1:/home/vagrant# docker info
Containers: 3
 Running: 3
 Paused: 0
 Stopped: 0
Images: 2
Server Version: 1.11.2
Storage Driver: overlay
 Backing Filesystem: extfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge null host
Kernel Version: 4.2.0-30-generic
Operating System: Ubuntu 15.10
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 992.9 MiB
Name: follower1
ID: FJYP:QGBI:QQRC:DCXS:OEOW:36JV:JMPV:DTFV:6B6K:C4XO:PEQO:LJYE
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support

A cluster-lab health shows

root@follower1:/home/vagrant# cluster-lab health

Internet Connection
  [PASS]   eth1 exists
  [PASS]   eth1 has an ip address
  [PASS]   Internet is reachable
  [PASS]   DNS works

Networking
  [PASS]   eth1.200 exists
  [PASS]   eth1.200 has correct IP from vlan network
  [PASS]   Cluster leader is reachable
  [PASS]   eth1.200 has exactly one IP
  [PASS]   eth1.200 has no local link address
  [PASS]   Avahi process exists
  [PASS]   Avahi is using eth1.200

Docker
  [PASS]   Docker is running
  [FAIL]   Docker is configured to use Consul as key-value store
  [FAIL]   Docker is configured to listen via tcp at port 2375
  [FAIL]   Docker listens on 192.168.200.26 via tcp at port 2375 (Docker-Engine)

Consul
  [PASS]   Consul Docker image exists
  [PASS]   Consul Docker container is running
  [PASS]   Consul is listening on port 8300
  [PASS]   Consul is listening on port 8301
  [PASS]   Consul is listening on port 8302
  [PASS]   Consul is listening on port 8400
  [PASS]   Consul is listening on port 8500
  [PASS]   Consul is listening on port 8600
  [PASS]   Consul API works
  [PASS]   Cluster-Node is pingable with IP 192.168.200.26
  [PASS]   Cluster-Node is pingable with IP 192.168.200.45
  [PASS]   Cluster-Node is pingable with IP 192.168.200.1
  [PASS]   No Cluster-Node is in status 'failed'
  [FAIL]   Consul is able to talk to Docker-Engine on port 7946 (Serf)

Swarm
  [PASS]   Swarm-Join Docker container is running
  [PASS]   Swarm-Manage Docker container is running
  [PASS]   Number of Swarm and Consul nodes is equal which means our cluster is healthy

It seems the Docker daemon was not configured correctly by the cluster-lab.

I guess the problem is related to the following line: https://github.com/hypriot/cluster-lab/blob/master/package/usr/local/lib/cluster-lab/docker_lib#L79-L81

@firecyberice What do you think?

mjgorman commented 8 years ago

@Govinda-Fichtner @firecyberice I Might have a fix for this. Testing now.

firecyberice commented 8 years ago

@Govinda-Fichtner can you please add the missconfigured /etc/systemd/system/docker.service file

mjgorman commented 8 years ago

@firecyberice @Govinda-Fichtner Issue was in the script that copies the service file. Was putting it in /lib instead of /etc. #49 for the fix.

root@leader:~# cluster-lab health

Internet Connection
  [PASS]   eth1 exists
  [PASS]   eth1 has an ip address
  [PASS]   Internet is reachable
  [PASS]   DNS works

Networking
  [PASS]   eth1.200 exists
  [PASS]   eth1.200 has correct IP from vlan network
  [PASS]   Cluster leader is reachable
  [PASS]   eth1.200 has exactly one IP
  [PASS]   eth1.200 has no local link address
  [PASS]   Avahi process exists
  [PASS]   Avahi is using eth1.200
  [PASS]   Avahi cluster-leader.service file exists

DNSmasq
  [PASS]   dnsmasq process exists
  [PASS]   /etc/dnsmasq.conf backup file exists

Docker
  [PASS]   Docker is running
  [PASS]   Docker is configured to use Consul as key-value store
  [PASS]   Docker is configured to listen via tcp at port 2375
  [PASS]   Docker listens on 192.168.200.1 via tcp at port 2375 (Docker-Engine)

Consul
  [PASS]   Consul Docker image exists
  [PASS]   Consul Docker container is running
  [PASS]   Consul is listening on port 8300
  [PASS]   Consul is listening on port 8301
  [PASS]   Consul is listening on port 8302
  [PASS]   Consul is listening on port 8400
  [PASS]   Consul is listening on port 8500
  [PASS]   Consul is listening on port 8600
  [PASS]   Consul API works
  [PASS]   Cluster-Node is pingable with IP 192.168.200.38
  [PASS]   Cluster-Node is pingable with IP 192.168.200.1
  [PASS]   No Cluster-Node is in status 'failed'
  [PASS]   Consul is able to talk to Docker-Engine on port 7946 (Serf)

Swarm
  [PASS]   Swarm-Join Docker container is running
  [PASS]   Swarm-Manage Docker container is running
  [PASS]   Number of Swarm and Consul nodes is equal which means our cluster is healthy