Closed diegito closed 1 year ago
After some time, I managed to reproduce it - although not in a consistent way. It seems that docker logs
on very long logs, causes the docker daemon to freeze (docker ps
does not work, docker info
works with the output above)
What I have on this docker daemon, is an ELK stack (ElasticSearch, Logstash and Kibana) with 3 logspout instances pushing logs from 3 swarm machines, into logstash that does the filtering.
It happened that logstash was crashing a few times so to inspect it I performed docker logs logstash
and an endless amount of logs started pouring out, such that I had to kill docker logs with ctrl+c
. I tried also docker logs logstash --tail 100
but in that instance, the docker daemon simply hung up without giving me a response.
I'll leave it there for a couple more hours in case anybody reads this message and wishes me to run more commands on it before I restart it.
I am also seeing this behavior in 1.13.1 and as OP notes issue appears to be related to large log files but I'm not 100% sure. I was investigating an issue where logs where no longer being written to the log file in /var/lib/docker/container/
Would appreciate some advice on how I could work around this issue!
Some recommendations from the top of my head;
max-file
and max-size
options, otherwise there's no limit present, and logs will grow unlimited (potentially leading to running out of disk space); https://docs.docker.com/engine/admin/logging/json-file/#options - be aware that logging options are only applied when starting a new container, so setting these options as a daemon configuration only affects containers after the options were setI'm not sure if my issue is the exact same bug, but it is the same symptoms, and I have a way to reliably reproduce it. Hopefully this information will be of some help.
docker run -d neo4j
(The actual image doesn't matter too much, replace neo4j with your image of choice, as long as it stays running in the background and uses a fair amount of memory)If I open the Hyper-v Manager, The MobyLinuxVM is only using ~6-15% CPU while the daemon is unresponsive; and the host is running at about 10-20% CPU. Host Disk runs up to 100%, and is probably the main limit causing the problem. If your Docker is unresponsive, I recommend checking the host Disk utilization.
The problem starts when the sum of MEM % exceeds 100. If you exceed 100% far enough, a hard reboot will be required.
Client:
Version: 18.06.1-ce
API version: 1.38
Go version: go1.10.3
Git commit: e68fc7a
Built: Tue Aug 21 17:21:34 2018
OS/Arch: windows/amd64
Experimental: false
Error response from daemon: A blocking operation was interrupted by a call to WSACancelBlockingCall.
PS C:\Users\ryanculp\workspace\Stampede\Docker Images\target\stampede\tak> docker version
Client:
Version: 18.06.1-ce
API version: 1.38
Go version: go1.10.3
Git commit: e68fc7a
Built: Tue Aug 21 17:21:34 2018
OS/Arch: windows/amd64
Experimental: false
Server:
Engine:
Version: 18.06.1-ce
API version: 1.38 (minimum version 1.12)
Go version: go1.10.3
Git commit: e68fc7a
Built: Tue Aug 21 17:29:02 2018
OS/Arch: linux/amd64
Experimental: false
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 18.06.1-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
runc version: 69663f0bd4b60df09991c08812a60108003fa340
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 4.9.93-linuxkit-aufs
Operating System: Docker for Windows
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 1.934GiB
Name: linuxkit-00155dec1142
ID: ZVKP:OHOH:RVAK:5KM5:2NGV:BF33:E5VP:CCOJ:QOCE:4PYQ:U3SY:ZAIH
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: 22
Goroutines: 46
System Time: 2018-09-05T17:07:33.6394803Z
EventsListeners: 1
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Let me close this ticket for now, as it looks like it went stale.
As suggested in #26768, I'm creating a new issue for this. Unfortunately I was not able to reproduce it locally to get more information.
I found the blocking issue in a couple of times, but I can't seem to reproduce. I can check the
docker info
ordocker version
but docker CLI becomes unresponsive when tryingdocker ps
. The version is1.13.1
.steps to reproduce: I noticed that one of my services was not getting response from an InfluxDB instance inside a docker container. So I went to visit that container to check the logs. I was able to
curl
the container and get some sort of response.docker ps
and other commands worked fine. I also restarted the container successfully withdocker restart
. I performeddocker logs
but the logs were so many that I decided toctrl+c
and use thedocker logs --tail
option. This hung indefinitely till I decided to ctrl+c that as well. Next thing I know is thatcurl
ing the container was no longer working and not evendocker ps
was working. The container I was exploring was influxdb container. Docker daemon logs don't show anything interesting.Docker logs
Docker version
service docker status
docker service restart
worked in unblocking daemon.