Open rodoufu opened 4 years ago
It may interest @foxundermoon and @stowns
The same problem:
$ docker info
Containers: 32
Running: 16
Paused: 0
Stopped: 16
Images: 20
Server Version: 18.06.1-ce
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: error
NodeID:
Error: manager stopped: can't initialize raft node: WAL error cannot be repaired: unexpected EOF
Is Manager: false
Node Address: 10.ххх.ххх.ххх
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
runc version: 69663f0bd4b60df09991c08812a60108003fa340
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-862.el7.x86_64
Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo)
OSType: linux
Architecture: x86_64
CPUs: ххх
Total Memory: хххGiB
Name: sks06mpbl001
ID: OHEB:MSQ4:YYOF:PEWL:KOCU:I3BU:XWM4:3R3Y:NI54:ZHIS:L2LW:6GC2
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
HTTP Proxy: http://127.0.0.1:3128
No Proxy: localhost,127.0.0.0/8,<host....>,<another-host>
Registry: https://index.docker.io/v1/
Labels:
Experimental: true
Insecure Registries:
127.0.0.0/8
Registry Mirrors:
https://<host_registry>/
Live Restore Enabled: false
Any help?
release free space on host machine and run
sudo systemctl restart docker
help it.
the same problem:
[root@itserver4 docker-deploy]# docker info Client: Debug Mode: false Server: Containers: 95 Running: 38 Paused: 0 Stopped: 57 Images: 102 Server Version: 19.03.9 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: error NodeID: Error: manager stopped: can't initialize raft node: WAL error cannot be repaired: unexpected EOF Is Manager: false Node Address: 10.116.200.4 Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd init version: fec3683 Security Options: seccomp Profile: default Kernel Version: 3.10.0-1127.8.2.el7.x86_64 Operating System: CentOS Linux 7 (Core) OSType: linux Architecture: x86_64 CPUs: 8 Total Memory: 31.16GiB Name: itserver4 ID: EDKT:BAQ2:G2UL:JVBH:ZRRW:23BB:HVB3:PFSX:DDP2:HMYV:SCNH:ZHTZ Docker Root Dir: /mnt/data/varlib_docker/docker Debug Mode: false Registry: https://index.docker.io/v1/ Labels: Experimental: true Insecure Registries: 127.0.0.0/8 Registry Mirrors: https://gpkhi0nk.mirror.aliyuncs.com/ Live Restore Enabled: false WARNING: API is accessible on http://0.0.0.0:2375 without encryption. Access to the remote API is equivalent to root access on the host. Refer to the 'Docker daemon attack surface' section in the documentation for more information: https://docs.docker.com/engine/security/security/#docker-daemon-attack-surface
the same problem:
[root@itserver4 docker-deploy]# docker info Client: Debug Mode: false Server: Containers: 95 Running: 38 Paused: 0 Stopped: 57 Images: 102 Server Version: 19.03.9 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: error NodeID: Error: manager stopped: can't initialize raft node: WAL error cannot be repaired: unexpected EOF Is Manager: false Node Address: 10.116.200.4 Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd init version: fec3683 Security Options: seccomp Profile: default Kernel Version: 3.10.0-1127.8.2.el7.x86_64 Operating System: CentOS Linux 7 (Core) OSType: linux Architecture: x86_64 CPUs: 8 Total Memory: 31.16GiB Name: itserver4 ID: EDKT:BAQ2:G2UL:JVBH:ZRRW:23BB:HVB3:PFSX:DDP2:HMYV:SCNH:ZHTZ Docker Root Dir: /mnt/data/varlib_docker/docker Debug Mode: false Registry: https://index.docker.io/v1/ Labels: Experimental: true Insecure Registries: 127.0.0.0/8 Registry Mirrors: https://gpkhi0nk.mirror.aliyuncs.com/ Live Restore Enabled: false WARNING: API is accessible on http://0.0.0.0:2375 without encryption. Access to the remote API is equivalent to root access on the host. Refer to the 'Docker daemon attack surface' section in the documentation for more information: https://docs.docker.com/engine/security/security/#docker-daemon-attack-surface
I found the cause: disk space is exausted
the same problem:
[root@itserver4 docker-deploy]# docker info Client: Debug Mode: false Server: Containers: 95 Running: 38 Paused: 0 Stopped: 57 Images: 102 Server Version: 19.03.9 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: error NodeID: Error: manager stopped: can't initialize raft node: WAL error cannot be repaired: unexpected EOF Is Manager: false Node Address: 10.116.200.4 Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd init version: fec3683 Security Options: seccomp Profile: default Kernel Version: 3.10.0-1127.8.2.el7.x86_64 Operating System: CentOS Linux 7 (Core) OSType: linux Architecture: x86_64 CPUs: 8 Total Memory: 31.16GiB Name: itserver4 ID: EDKT:BAQ2:G2UL:JVBH:ZRRW:23BB:HVB3:PFSX:DDP2:HMYV:SCNH:ZHTZ Docker Root Dir: /mnt/data/varlib_docker/docker Debug Mode: false Registry: https://index.docker.io/v1/ Labels: Experimental: true Insecure Registries: 127.0.0.0/8 Registry Mirrors: https://gpkhi0nk.mirror.aliyuncs.com/ Live Restore Enabled: false WARNING: API is accessible on http://0.0.0.0:2375 without encryption. Access to the remote API is equivalent to root access on the host. Refer to the 'Docker daemon attack surface' section in the documentation for more information: https://docs.docker.com/engine/security/security/#docker-daemon-attack-surface
I found the cause: disk space is exausted
Yes, I've described it in the issue description, it happened after the machine ran out of space. But even after I released some space it was happening.
Old thread, but same issue: The same here: When the disk of my swarm manager ran out of space, I got the above-mentioned error message. Even after freeing space and rebooting the issue persists, I have not found a solution yet.
@jory3 Did you find a solution eventually ?
@shrinidhi-live unfortunately not, I finally set up the cluster again and then restored a portainer-backup.
Hi, I am also running in this issue constantly.
We have a dev cluser with 20 nodes, 3 managers. On (one of) the manager nodes we run some workloads that for some reason fill the disk every couple of days. When this happens, the node state changes to Down and then it can't recover. I made free disk space available and restarted the whole server. The node will not rejoin the swarm cluster.
I had to remove the node from the swarm and also have it leave the swarm (both sides - he believed he was in a swarm !?). Then I joined the node again to the cluster.
This has happened to me a couple of times - so it's reproducible. I believe it happened on more than 1 node with same issue.
The problem happened after the machine run out of space. Now I cannot leave the swarm either create new containers.
manager node
WARNING: overlay: the backing xfs filesystem is formatted without d_type support, which leads to incorrect behavior. Reformat the filesystem with ftype=1 to enable d_type support. Running without d_type support will not be supported in future releases. WARNING: bridge-nf-call-ip6tables is disabled
I've tried to leave the swarm but it hasn't worked:
Similar and unsolved: https://github.com/docker/classicswarm/issues/2819