microsoft / mssql-docker

Official Microsoft repository for SQL Server in Docker resources
MIT License
1.74k stars 758 forks source link

sqlsrv goes zombie and requires hard reset of host machine #181

Closed andrewnicols closed 6 years ago

andrewnicols commented 7 years ago

We use mssql-docker for our continuous integration testing. Our current setup requires that the images be running constantly and we have a number of databases present.

We currently use CTP-2.1 - we can't use RC1/2 due to #126, and we can't use 2017-GA due to #180.

We are finding that sqlsrv periodically hangs. The tests we have running can no longer connect to the server. If we try to stop, or kill, the docker container, it fails to do so and one of the processes seems to end in a zombie state. The only solution seems to be a hard reset of the host machine. We cannot stop the image, or kill the process at all due to the zombie state. If we try to issue a standard reboot, the machine hangs during shutdown due to the zombie process.

I can attempt to collect any debugging required, so please let me know what you need. We typically see this when the machine has been running approx 1 week.

The run command we use is:

docker run --detach --name sqlsrv --network nightly -e ACCEPT_EULA=Y -e SA_PASSWORD=Passw0rd! microsoft/mssql-server-linux:ctp-2.1
edsonmedina commented 7 years ago

+1

I'm getting the same issue intermittently on different machines.

$ docker version
Client:
 Version:      17.09.0-ce
 API version:  1.32
 Go version:   go1.8.3
 Git commit:   afdb6d4
 Built:        Tue Sep 26 22:42:18 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.09.0-ce
 API version:  1.32 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   afdb6d4
 Built:        Tue Sep 26 22:40:56 2017
 OS/Arch:      linux/amd64
 Experimental: false
gmist commented 7 years ago

I have the same problem.

# docker info
Containers: 1
 Running: 1
 Paused: 0
 Stopped: 0
Images: 1
Server Version: 1.12.6
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 12
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: host overlay bridge null
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: apparmor seccomp
Kernel Version: 4.4.0-93-generic
Operating System: Ubuntu 16.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 7
Total Memory: 6.812 GiB
Name: serg
ID: 4KX3:W3HN:CPVF:GA67:TDL5:HGLT:YO4B:MYC2:R46V:R75S:CMLW:IHXU
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
 127.0.0.0/8

# docker version
Client:
 Version:      1.12.6
 API version:  1.24
 Go version:   go1.6.2
 Git commit:   78d1802
 Built:        Tue Jan 31 23:35:14 2017
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.6
 API version:  1.24
 Go version:   go1.6.2
 Git commit:   78d1802
 Built:        Tue Jan 31 23:35:14 2017
 OS/Arch:      linux/amd64
andrewnicols commented 7 years ago

Guys, I'd suggest also posting the output from docker info as this contains additional information about your setup.

I'm currently trialling a switch from the aufs storage method to overlay2 to see if I still see the same kinds of issues - this has also finally allowed me to use a newer version of the image.

andrewnicols commented 7 years ago

Here are my docker info and docker version details from an affected node:

jenkins@banana:~$ docker info
Containers: 13
 Running: 5
 Paused: 0
 Stopped: 8
Images: 17
Server Version: 17.06.0-ce
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 334
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: cfb82a876ecc11b5ca0977d1733adbe58599088a
runc version: 2d41c047c83e09a6d61d464906feb2a2f3c52aa4
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-97-generic
Operating System: Ubuntu 16.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.34GiB
Name: banana
ID: LHSX:VYJH:M3PA:JLVT:V2VQ:LPHC:ZD6Y:YVXA:B2WE:UZ7L:FKSA:FBGW
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
jenkins@banana:~$ docker version
Client:
 Version:      17.06.0-ce
 API version:  1.30
 Go version:   go1.8.3
 Git commit:   02c1d87
 Built:        Fri Jun 23 21:23:31 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.06.0-ce
 API version:  1.30 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   02c1d87
 Built:        Fri Jun 23 21:19:04 2017
 OS/Arch:      linux/amd64
 Experimental: false
neilwightman commented 7 years ago

+1 $ docker version

Client:
 Version:      17.05.0-ce
 API version:  1.29
 Go version:   go1.7.5
 Git commit:   89658be
 Built:        Thu May  4 22:06:06 2017
 OS/Arch:      linux/amd64

 Server:
 Version:      17.05.0-ce
 API version:  1.29 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   89658be
 Built:        Thu May  4 22:06:06 2017
 OS/Arch:      linux/amd64
 Experimental: false

$ docker info

Containers: 11
 Running: 0
 Paused: 0
 Stopped: 11
Images: 11
Server Version: 17.05.0-ce
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 159
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9048e5e50717ea4497b757314bad98ea3763c145
runc version: 9c2d8d184e5da67c95d601382adf14862e4f2228
init version: 949e6fa
Security Options:
 apparmor
Kernel Version: 4.4.0-96-generic
Operating System: Ubuntu 16.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 11.73GiB
Name: int020506
ID: PRR7:KRBA:ARMC:7IKQ:YU36:OAEO:WJMS:UBML:4HNM:DZWF:PCKN:D3XF
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 docker.bbn.intergral.com:5050
 docker.bbn.intergral.com:5555
 docker.bbn.intergral.com:5000
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support
kabalin commented 6 years ago

+1 Same thing on Docker 17.03.1-ce:


Containers: 6
 Running: 2
 Paused: 0
 Stopped: 4
Images: 86
Server Version: 17.03.1-ce
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 138
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 4ab9917febca54791c5f071a9d1f404867857fcc
runc version: 54296cf40ad8143b62dbcaa1d90e520a2136ddfe
init version: 949e6fa
Kernel Version: 3.16.0-4-amd64
Operating System: Debian GNU/Linux 8 (jessie)
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.6 GiB
Name: mozart
ID: ZBWN:V5TW:XYHP:QKZT:UM5T:ZQAZ:34WH:Q67P:3V6B:HTCY:UGML:7NM3
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No memory limit support
WARNING: No swap limit support
WARNING: No kernel memory limit support
WARNING: No oom kill disable support
WARNING: No cpu cfs quota support
WARNING: No cpu cfs period support
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
kabalin commented 6 years ago

It seems the similar issue is also discussed here: https://github.com/Microsoft/mssql-docker/issues/171

jest commented 6 years ago

Maybe it's somehow related to the fix in https://support.microsoft.com/en-us/help/4093805/fix-can-t-stop-sql-server-linux-docker-container-via-docker-stop ? Try 2017-CU5.

seanamosw commented 6 years ago

I was at random having the same issue with 2017-CU8. However I saw this: https://github.com/testcontainers/testcontainers-java-module-mssqlserver/issues/7

After changing the docker storage to overlay2 on our ubuntu build agents from aufs, it hasn't happened again in 2 days with regular use. I see everyone who has reported in this issue is using aufs.

gmist commented 6 years ago

Changing to overlay2 has solved this issue.

twright-msft commented 6 years ago

Thanks for confirming this was the issue. We are going to update the docs that overlay2 is our recommended/tested/supported storage driver. Closing this one out..