Open jest opened 7 years ago
+1
I got stuck too, here is how I did:
version: '3'
services:
db:
image: microsoft/mssql-server-linux
environment:
ACCEPT_EULA: Y
SA_PASSWORD: "xyz"
This runs successfully. I shared the port on my host
#...
ports:
- "1433:1433"
And updated the container:
$ docker-compose up -d db
ERROR: for my_db_1 UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=70)
ERROR: An HTTP request took too long to complete. Retry with --verbose to obtain debug information.
If you encounter this issue regularly because of slow network conditions, consider setting COMPOSE_HTTP_TIMEOUT to a higher value (current value: 60).
Now it can't be stopped nor killed
$ docker-compose stop db
# Same timeout error
$ docker-compose kill db
# Same timeout error
$ docker-compose kill -s TERM db
# Gets stuck
I even can't stop the docker service anymore (I had to restart the computer). Current version:
Client:
Version: 17.05.0-ce
API version: 1.29
Go version: go1.7.5
Git commit: 89658be
Built: Thu May 4 22:04:27 2017
OS/Arch: linux/amd64
It worked after a computer restart but happened again after a docker-compose up
refresh.
A running mssql container doesn't seem liking to be updated with up
I've been having this issue too, the container freezes and a computer restart is the only way to stop it.
+1 Same here
+1 Same here.
+1 same here
We think what's going on here is that SQL Server is gracefully shutting down so that when it starts back up there is no recovery time needed. We could change it to a fast shut down on CTRL+C but that could mean a longer period of time for recovery on start up depending on what was going on in the database(s) prior to the CTRL+C.
I used to run into this issue too, but then I stopped running docker-compose up and docker run interactively. In other words use -d on docker run and docker-compose up so that the containers are always started in the background and you can get your terminal prompt back. Then you can stop your containers with docker stop or docker-compose down.
Also, when you use CTRL+C to a docker-compose up interactive, you can hit CTRL+C again to force stopping immediately. I don't recommend that in anything except for a dev/test environment where you just don't care.
A few questions to better help us understand how to improve here:
Sorry, but this is not a graceful shutdown. No matter how much time
is given with docker stop -t time
, SQL Server never stops within this time period.
It has nothing to do with CTRL+C, I never used it. My containers are started with docker-compose up -d
and stopped with docker-compose stop -t time
.
Please also note what I have written about sending TERM signal to one of forked processes: it leads to gracefully stopped container within 1 second! If I were to guess, I'd go for checking correct signal handling in the parent process.
I didn't use Ctrl+C
either (the first one is supposed to gracefully shutdown anyway)
I'm also using docker-compose stop
(after up -d
) with really nothing big going on the database (tested with only one database and 4 empty tables actually).
@twright-msft I used to start/stop many different services (like nginx, apache, php, python, mysql, postgresql, redis, memcached, etc...), they always stop gracefully within 4sec max. I might still have a preference for the fast startup/slow shutdown but the slow should be within 4sec. Anyway, as @jest says, sometimes it never stops, I already tried leaving the graceful stop running for at least 30min.
I've seen both cases, both intermittently. Sometimes they work, sometimes they don't.
Using CRTL + C
just hangs forever (I've waited more than 10 minutes) and the container becomes non-responsive (can't exec
into it).
Using docker-compose stop/down
instead returns me a timeout. The container never dies.
This makes it useless until it's fixed.
OK, I dig a bit and here's a solution.
The problem is this line in Dockerfile
CMD /opt/mssql/bin/sqlservr
According to the Docker docs its "shell syntax" causes Docker daemon to run the container with a command:
/bin/sh -c /opt/mssql/bin/sqlservr
Which makes Bash a "PID 1" process and causes a lot of problems, including signal handling and children reaping. The issue on tini describes it pretty well.
The solution is to modify Dockerfile
and either to make sqlservr
"PID 1" itself using another CMD
syntax:
CMD ["/opt/mssql/bin/sqlservr"]
or better yet, to use some other "process manager", like the mentioned tini:
# with tini next to Dockerfile...
COPY tini /
RUN chmod +x /tini
ENTRYPOINT ["/tini", "--"]
CMD ["/opt/mssql/bin/sqlservr"]
As a workaround till new images are available, use command: [ "/opt/mssql/bin/sqlservr" ]
in your docker-compose.yml
to overwrite the image's CMD
.
@twright-msft Any idea how this will be solved? Do you need a PR?
Any news on this? We're using the container for testing in a CI pipeline and have to restart our server practically every day because of this. Neither overwriting the command with CMD ["/opt/mssql/bin/sqlservr"]
nor adding tini as suggested help with the problem.
We're likely going to switch to this in a near future release. CMD ["/opt/mssql/bin/sqlservr"]
We'll see if that helps fix it for at least some people.
Well, for us it didn't. Any more ideas?
Probably other issue?
The workaround command: [ "/opt/mssql/bin/sqlservr" ]
did not work for me either.
I use the following workaround in our CI environment:
command: [ "/opt/mssql/bin/sqlservr" ]
in docker composedocker exec <container-name> kill 1 || :
Did you destroy the old containers and created new ones with command:
workaround? Once created, containers can't change their command to be executed. What does docker inspect -f '{{ .Config.Cmd }}' <container-name>
says?
@jest
The output is [/opt/mssql/bin/sqlservr]
And yes, since it is only a CI environment I destroy everything completly on each build
docker exec <mssql-container-name> kill 1 || :
docker-compose stop
docker-compose rm -f
docker-compose build
docker-compose up -d
We started running into issues with the MS SQL Server containers hanging around on our Jenkins instance after builds completed (or didn't). It eventually got bad enough that the servers would lock up and de-provisioning them would take up to 30 minutes.
The solution for killing process 1 seems to solve the issue for us: https://github.com/Microsoft/mssql-docker/issues/171#issuecomment-362193062
Update: overriding the command within a Dockerfile, or through specifying it when running, did not solve the problem of zombie processes and MS SQL Server.
We are seeing a problem very similar to #181, which has the same behaviour as the issue described in this ticket, after using a SQL Server instance (CU2, CU4, GA tested) for a short period of time and then trying to shut it down. I'm going to put the odds of it hanging at 50/50 every time we spin up a new container. Sending the TERM or KILL signals to the container or sqlservr
processes does not solve the issue for us, the processes refuse the die unless the system is de-provisioned.
Note that we are not using Docker Compose on our build servers, and we are seeing this issue when running the containers through the Docker engine directly.
@kevin-brown So this issue is not the one you are experiencing. This issue is about wrong image's CMD
construction, where signals are not propagated to child processes.
Sending signals directly to child processes is the same as correcting CMD
in Dockerfile.
@kevin-brown we are facing probably the same issue and we use tini but no luck. Do you believe that "-g" option on tini to kill the whole process group could make a difference? We are going to try it
So this issue is not the one you are experiencing. This issue is about wrong image's CMD construction, where signals are not propagated to child processes.
We're seeing signs of the signals not propagating when we send them to the Docker images, and attempt to send them directly to the process. The behaviour we're seeing in #181 is making it really difficult to verify the signals are making it to sqlservr
because if it hangs for too long it completely locks up Docker and the host system.
I'm willing to accept that there are two different issues at play in #171 and #181, but the fact that both of them deal with zombie processes forming within the container gives me hope that there may be a common solution to both issues.
we are facing probably the same issue and we use tini but no luck. Do you believe that "-g" option on tini to kill the whole process group could make a difference? We are going to try it
We have not yet tried using tini
to work around this issue, but if you're not currently killing the right process (but instead are killing a parent process) that might work.
Anyone having problems with CTRL+C that are not solved by correcting ENTRYPOINT (as described in comment https://github.com/Microsoft/mssql-docker/issues/171#issuecomment-346133376), please test 2017-CU5. According to https://support.microsoft.com/en-us/help/4093805/fix-can-t-stop-sql-server-linux-docker-container-via-docker-stop it's solved there.
@jest CU5 seems to fix this for me. But with CU6 the same problem occurs again.
I am using CU12 and I am seeing the same issue
I've faced some issues with this as well. I have been using a version which does not spawn mssql inside a shell (IE. I've been using a sufficiently recent version that contains addd8374e7ff488a916e4ed1ec634b364b649209), but still experience inability to shut down the container. docker kill
halts and I can't even restart the daemon, I can only restart the machine.
The logs indicate that a signal was received, but it apparently entered some weird state afterwards.
[...]
2019-06-18 09:05:39.68 spid6s Always On: The availability replica manager is going offline because SQL Server is shutting down. This is an informational message only. No user action is required.
2019-06-18 09:05:39.68 spid6s SQL Server is terminating in response to a 'stop' request from Service Control Manager. This is an informational message only. No user action is required.
2019-06-18 09:05:39.78 spid22s Service Broker manager has shut down.
2019-06-18 09:05:43.43 Logon Error: 18451, Severity: 14, State: 1.
2019-06-18 09:05:43.43 Logon Login failed for user 'NT AUTHORITY\SYSTEM'. Only administrators may connect at this time. [CLIENT: 127.0.0.1]
2019-06-18 09:05:48.61 Logon Error: 18451, Severity: 14, State: 1.
2019-06-18 09:05:48.61 Logon Login failed for user 'NT AUTHORITY\SYSTEM'. Only administrators may connect at this time. [CLIENT: 127.0.0.1]
2019-06-18 09:10:53.49 Logon Error: 18451, Severity: 14, State: 1.
2019-06-18 09:10:53.49 Logon Login failed for user 'NT AUTHORITY\SYSTEM'. Only administrators may connect at this time. [CLIENT: 127.0.0.1]
While a normal shutdown looks like following.
[...]
2019-06-18 10:33:05.68 spid6s Always On: The availability replica manager is going offline because SQL Server is shutting down. This is an informational message only. No user action is required.
2019-06-18 10:33:05.68 spid6s SQL Server is terminating in response to a 'stop' request from Service Control Manager. This is an informational message only. No user action is required.
2019-06-18 10:33:06.11 spid23s Service Broker manager has shut down.
2019-06-18 10:33:11.29 spid6s SQL Trace was stopped due to server shutdown. Trace ID = '1'. This is an informational message only; no user action is required.
Using mcr.microsoft.com/mssql/server:2022-latest
and I seem to face the same issue. SQL Server gets stuck on stopping the container. When I wait long enough (sometimes minutes) I get some timeout errors on the stop and then all of a sudden the container is also stopped.
So, after diving into the Tini and PID 1 issues, I gotta say, I'm blown away by how easy the latest solution is. You've got two options to choose from:
docker run
, add --init
.--init : Run an init inside the container that forwards signals and reaps processes
init: true
under the services
section of your compose file. For example:
services:
db:
image: mcr.microsoft.com/azure-sql-edge:latest
init: true
Just a heads up, when you use --init or init: true, an extra process is launched within the container that acts as the PID 1 process. This process takes care of managing the child processes inside the container and making sure that signals are forwarded correctly. This helps to ensure that the container shuts down gracefully and all the child processes are cleaned up properly.
On Linux, using Docker CLI it is not possible to gracefully stop the container. Running
docker stop
causes the daemon to send TERM signal to the container process, which is ignored and only KILL signal causes the server to stop. However, this is abrupt and the next time the container is started it rolls forward logs.However, I noticed that the main container process forks additional
sqlservr
processes and if I send TERM signal to one of those processes, the whole container shuts down gracefully immediately and no log replaying is performed on the next startup.It is looks like the problem with the process and signals management.