testcontainers / testcontainers-dotnet

A library to support tests with throwaway instances of Docker containers for all compatible .NET Standard versions.
https://dotnet.testcontainers.org
MIT License
3.73k stars 262 forks source link

[Bug]: MsSql health check does not complete on newest container image #1220

Closed CCThorstenSauter closed 1 week ago

CCThorstenSauter commented 1 month ago

Testcontainers version

3.9.0

Using the latest Testcontainers version?

Yes

Host OS

Linux

Host arch

x64

.NET version

8.0.303

Docker version

Client:
 Version:           25.0.5
 API version:       1.44
 Go version:        go1.21.10
 Git commit:        d260a54c81efcc3f00fe67dee78c94b16c2f8692
 Built:             Sun May 12 07:25:43 2024
 OS/Arch:           linux/amd64
 Context:           default

Server:
 Engine:
  Version:          25.0.5
  API version:      1.44 (minimum version 1.24)
  Go version:       go1.21.10
  Git commit:       e63daec8672d77ac0b2b5c262ef525c7cf17fd20
  Built:            Sun May 12 07:25:43 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.7.10
  GitCommit:        4e1fe7492b9df85914c389d1f15a3ceedbb280ac
 runc:
  Version:          1.1.12
  GitCommit:        51d5e94601ceffbbd85688df1c928ecccbfa4685
 docker-init:
  Version:          0.19.0
  GitCommit:

Docker info

Client:
 Version:    25.0.5
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.12.0
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx

Server:
 Containers: 29
  Running: 5
  Paused: 0
  Stopped: 24
 Images: 15
 Server Version: 25.0.5
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 4e1fe7492b9df85914c389d1f15a3ceedbb280ac
 runc version: 51d5e94601ceffbbd85688df1c928ecccbfa4685
 init version:
 Security Options:
  seccomp
   Profile: builtin
 Kernel Version: 5.15.153.1-microsoft-standard-WSL2
 Operating System: Rancher Desktop WSL Distribution
 OSType: linux
 Architecture: x86_64
 CPUs: 24
 Total Memory: 15.58GiB
 Name: CCD-0024
 ID: 398be532-db59-47b3-bcf7-d989f4f09517
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No blkio throttle.read_bps_device support
WARNING: No blkio throttle.write_bps_device support
WARNING: No blkio throttle.read_iops_device support
WARNING: No blkio throttle.write_iops_device support

What happened?

When using the MsSql package with the newest container image mcr.microsoft.com/mssql/server:2022-latest with a digest of sha256:c1aa8afe9b06eab64c9774a4802dcd032205d1be785b1fd51e1c0151e7586b74, the health check specified in the waiting strategy never completes, even though the logs of the SQL server container show it being ready, leading to a timeout.

This behavior is not present when using a slightly older container image version, e.g. mcr.microsoft.com/mssql/server:2022-CU13-ubuntu-22.04 with a digest of sha256:c4369c38385eba011c10906dc8892425831275bb035d5ce69656da8e29de50d8.

Relevant log output

[testcontainers.org 00:00:00.38] Searching Docker registry credential in CredHelpers
[testcontainers.org 00:00:00.38] Searching Docker registry credential in CredsStore
[testcontainers.org 00:00:00.38] Searching Docker registry credential in Auths
[testcontainers.org 00:00:00.38] Docker registry credential https://index.docker.io/v1/ found
[testcontainers.org 00:00:01.50] Docker image testcontainers/ryuk:0.6.0 created
[testcontainers.org 00:00:01.58] Docker container 8d1b2fa17535 created
[testcontainers.org 00:00:01.64] Start Docker container 8d1b2fa17535
[testcontainers.org 00:00:01.96] Wait for Docker container 8d1b2fa17535 to complete readiness checks
[testcontainers.org 00:00:01.96] Docker container 8d1b2fa17535 ready
[testcontainers.org 00:00:01.97] Searching Docker registry credential in Auths
[testcontainers.org 00:00:01.97] Searching Docker registry credential in Auths
[testcontainers.org 00:00:01.97] Searching Docker registry credential in CredHelpers
[testcontainers.org 00:00:01.97] Searching Docker registry credential in CredsStore
[testcontainers.org 00:00:01.97] Docker registry credential mcr.microsoft.com not found
[testcontainers.org 00:00:18.94] Docker image mcr.microsoft.com/mssql/server:2022-latest created
[testcontainers.org 00:00:18.96] Docker container 4a3b482d21c9 created
[testcontainers.org 00:00:18.97] Start Docker container 4a3b482d21c9
[testcontainers.org 00:00:19.20] Wait for Docker container 4a3b482d21c9 to complete readiness checks
[testcontainers.org 00:00:19.20] Execute "/opt/mssql-tools/bin/sqlcmd -Q SELECT 1;" at Docker container 4a3b482d21c9
[testcontainers.org 00:00:20.27] Execute "/opt/mssql-tools/bin/sqlcmd -Q SELECT 1;" at Docker container 4a3b482d21c9
[testcontainers.org 00:00:21.32] Execute "/opt/mssql-tools/bin/sqlcmd -Q SELECT 1;" at Docker container 4a3b482d21c9
[testcontainers.org 00:00:22.42] Execute "/opt/mssql-tools/bin/sqlcmd -Q SELECT 1;" at Docker container 4a3b482d21c9
[testcontainers.org 00:00:23.58] Execute "/opt/mssql-tools/bin/sqlcmd -Q SELECT 1;" at Docker container 4a3b482d21c9
[testcontainers.org 00:00:24.69] Execute "/opt/mssql-tools/bin/sqlcmd -Q SELECT 1;" at Docker container 4a3b482d21c9
...
[testcontainers.org 00:03:37.79] Execute "/opt/mssql-tools/bin/sqlcmd -Q SELECT 1;" at Docker container 4a3b482d21c9

Additional information

No response

binaryevents commented 1 month ago

We see the same problem at the moment.

intrepid-developer commented 1 month ago

This is also affecting us when it's running inside our GitHub Actions for CI/CD. It's currently preventing us from doing any releases.

szl-spyro commented 1 month ago

I confirm that our tests using TestContainers and MsSQL stopped passing today 🤕

pascalberger commented 1 month ago

When looking at the image it seems that path for sqlcmd has changed from /opt/mssql-tools/bin/sqlcmd to /opt/mssql-tools18/bin/sqlcmd. Not sure if this was intentional or not.

intrepid-developer commented 1 month ago

FYI someone has reported it on MSSQL-Docker: https://github.com/microsoft/mssql-docker/issues/892

kiview commented 1 month ago

As mentioned in Slack, we likely need to adapt the default wait strategy (see https://github.com/testcontainers/testcontainers-dotnet/blob/develop/src/Testcontainers.MsSql/MsSqlBuilder.cs#L132-L145).

Users can provide their own wait strategy configuration as a workaround.

jonathaneckman commented 1 month ago

This started blocking our Azure DevOps pipeline yesterday.

Fireblade954 commented 1 month ago

after @pascalberger comment en combined with @kiview i first ran into certificate issues:

Sqlcmd: Error: Microsoft ODBC Driver 18 for SQL Server : SSL Provider: [error:0A000086:SSL routines::certificate verify failed:self-signed certificate]. Sqlcmd: Error: Microsoft ODBC Driver 18 for SQL Server : Client unable to establish connection. For solutions related to encryption errors, see https://go.microsoft.com/fwlink/?linkid=2226722.

but got it working for now by also adding the -C option:

.WithWaitStrategy( Wait.ForUnixContainer() .UntilCommandIsCompleted("/opt/mssql-tools18/bin/sqlcmd", "-C", "-Q", "SELECT 1;") )

jonathaneckman commented 1 month ago

This works when run locally, but still times out when run in an Azure DevOps pipeline:

new MsSqlBuilder()
        .WithImage("mcr.microsoft.com/mssql/server:2022-latest")
        .WithEnvironment("ACCEPT_EULA", "Y")
        .WithPortBinding(11143, 1433)
        .WithWaitStrategy(
            Wait.ForUnixContainer()
                .UntilCommandIsCompleted(
                    "/opt/mssql-tools/bin/sqlcmd",
                    "-C",
                    "-Q",
                    "SELECT 1;"
                )
        )
        .Build();

This times out in both:

new MsSqlBuilder()
        .WithImage("mcr.microsoft.com/mssql/server:2022-latest")
        .WithEnvironment("ACCEPT_EULA", "Y")
        .WithPortBinding(11143, 1433)
        .WithWaitStrategy(
            Wait.ForUnixContainer()
                .UntilCommandIsCompleted(
                    "/opt/mssql-tools18/bin/sqlcmd",
                    "-C",
                    "-Q",
                    "SELECT 1;"
                )
        )
        .Build();
tscrip commented 1 month ago

I have been able to replicate this locally by deleting the cached 2022-latest container image. After it downloads the latest image, it hangs indefinitely.

Adding .WithWaitStrategy( Wait.ForUnixContainer() .UntilCommandIsCompleted("/opt/mssql-tools18/bin/sqlcmd", "-C", "-Q", "SELECT 1;") ) resolved the issue. Thanks @Fireblade954!

jonathaneckman commented 1 month ago

@tscrip that did it. We missed a test project so had a false negative. Thanks!

eerhardt commented 1 month ago

This is also affecting .NET Aspire - https://github.com/dotnet/aspire/issues/5057

HofmeisterAn commented 1 month ago

After reading all these comments, I would like to point out that we recommend pinning the image version. Using the latest tag does not automatically update the cached image on your development machine; it will use the version it pulled weeks ago. Meanwhile, the ephemeral CI pipeline pulls the actual latest version because it is not cached (this may lead to different behaviors on developer machines and in the CI pipeline).

Since it looks like the new path will remain (https://github.com/microsoft/mssql-docker/issues/892#issuecomment-2249029917), we can update the default wait strategy for the new version. Overriding the wait strategy, as @Fireblade954 suggested, or pinning the version are workarounds to avoid this issue.

We can probably do something similar to what we are doing in the MongoDB module to determine which binary (path) is available.

jwyza-pi commented 1 month ago

BTW this will also break the ExecScriptAsync method as it also uses sqlcmd. (additionally they are now defaulting to encryption required, which means you need to pass -C with the sqlcmd to tell it to trust the server cert).

randsu commented 1 month ago

This works for us:

.WithWaitStrategy(Wait.ForUnixContainer().UntilPortIsAvailable(1433))

Xor-el commented 1 month ago

This works for us:

.WithWaitStrategy(Wait.ForUnixContainer().UntilPortIsAvailable(1433))

This won't always work as the container might be ready but MSSQL might not be ready to receive requests.

randsu commented 1 month ago

This works for us: .WithWaitStrategy(Wait.ForUnixContainer().UntilPortIsAvailable(1433))

This won't always work as the container might be ready but MSSQL might not be ready to receive requests.

Thats true, although it fails very rarely, atleast for us, and it will usually work, regardless of old or new image from microsoft.

I have now rewritten to use .WithWaitStrategy( Wait.ForUnixContainer() .UntilCommandIsCompleted("/opt/mssql-tools18/bin/sqlcmd", "-C", "-Q", "SELECT 1;") ) and downloaded the new image locally.