testcontainers / testcontainers-dotnet

A library to support tests with throwaway instances of Docker containers for all compatible .NET Standard versions.
https://dotnet.testcontainers.org
MIT License
3.82k stars 279 forks source link

[Bug]: Conflict: The container name `testcontainers-ryuk-...` is already in use #1252

Closed devboost-ska closed 1 month ago

devboost-ska commented 2 months ago

Testcontainers version

3.9.0

Using the latest Testcontainers version?

No

Host OS

Windows

Host arch

x86

.NET version

8.0.401

Docker version

docker version
Client:
 Cloud integration: v1.0.35+desktop.13
 Version:           26.1.1
 API version:       1.45
 Go version:        go1.21.9
 Git commit:        4cf5afa
 Built:             Tue Apr 30 11:48:43 2024
 OS/Arch:           windows/amd64
 Context:           default
Server: Docker Desktop 4.30.0 (149282)
 Engine:
  Version:          26.1.1
  API version:      1.45 (minimum version 1.24)
  Go version:       go1.21.9
  Git commit:       ac2de55
  Built:            Tue Apr 30 11:48:28 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.31
  GitCommit:        e377cd56a71523140ca6ae87e30244719194a521
 runc:
  Version:          1.1.12
  GitCommit:        v1.1.12-0-g51d5e94
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Docker info

docker info
Client:
 Version:    26.1.1
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.14.0-desktop.1
    Path:     C:\Program Files\Docker\cli-plugins\docker-buildx.exe
  compose: Docker Compose (Docker Inc.)
    Version:  v2.27.0-desktop.2
    Path:     C:\Program Files\Docker\cli-plugins\docker-compose.exe
  debug: Get a shell into any image or container (Docker Inc.)
    Version:  0.0.29
    Path:     C:\Program Files\Docker\cli-plugins\docker-debug.exe
  dev: Docker Dev Environments (Docker Inc.)
    Version:  v0.1.2
    Path:     C:\Program Files\Docker\cli-plugins\docker-dev.exe
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.23
    Path:     C:\Program Files\Docker\cli-plugins\docker-extension.exe
  feedback: Provide feedback, right in your terminal! (Docker Inc.)
    Version:  v1.0.4
    Path:     C:\Program Files\Docker\cli-plugins\docker-feedback.exe
  init: Creates Docker-related starter files for your project (Docker Inc.)
    Version:  v1.1.0
    Path:     C:\Program Files\Docker\cli-plugins\docker-init.exe
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc.)
    Version:  0.6.0
    Path:     C:\Program Files\Docker\cli-plugins\docker-sbom.exe
  scout: Docker Scout (Docker Inc.)
    Version:  v1.8.0
    Path:     C:\Program Files\Docker\cli-plugins\docker-scout.exe

Server:
 Containers: 9
  Running: 4
  Paused: 0
  Stopped: 5
 Images: 20
 Server Version: 26.1.1
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: e377cd56a71523140ca6ae87e30244719194a521
 runc version: v1.1.12-0-g51d5e94
 init version: de40ad0
 Security Options:
  seccomp
   Profile: unconfined
 Kernel Version: 5.15.153.1-microsoft-standard-WSL2
 Operating System: Docker Desktop
 OSType: linux
 Architecture: x86_64
 CPUs: 24
 Total Memory: 23.47GiB
 Name: docker-desktop
 ID: ecc1a989-367d-4943-8fe1-38e2be15f357
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http.docker.internal:3128
 HTTPS Proxy: http.docker.internal:3128
 No Proxy: hubproxy.docker.internal
 Labels:
  com.docker.desktop.address=npipe://\\.\pipe\docker_cli
 Experimental: false
 Insecure Registries:
  [redacted]
  hubproxy.docker.internal:5555
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No blkio throttle.read_bps_device support
WARNING: No blkio throttle.write_bps_device support
WARNING: No blkio throttle.read_iops_device support
WARNING: No blkio throttle.write_iops_device support
WARNING: daemon is not using the default seccomp profile

What happened?

While running multiple test stages in parallel on a Jenkins, sometimes a test (and all subsequent requiring a testcontainer) fail with:

Docker.DotNet.DockerApiException : Docker API responded with status code=Conflict, response={"message":"Conflict. The container name \"/testcontainers-ryuk-2a0cd295-fa67-4a24-b0b5-5f2c1df01806\" is already in use by container \"8cd59b4d1a722ec6ec3aabab5adb52933b44391b0975eee99d1bac6b52dbf522\". You have to remove (or rename) that container to be able to reuse that name."}

I expect this to never happen. The testcontainers library should either correctly handle an existing ryuk container it created, or ensure complete clean-up before starting a new one.

Relevant log output

Docker.DotNet.DockerApiException : Docker API responded with status code=Conflict, response={"message":"Conflict. The container name \"/testcontainers-ryuk-2a0cd295-fa67-4a24-b0b5-5f2c1df01806\" is already in use by container \"8cd59b4d1a722ec6ec3aabab5adb52933b44391b0975eee99d1bac6b52dbf522\". You have to remove (or rename) that container to be able to reuse that name."}

[...]

     at Docker.DotNet.DockerClient.HandleIfErrorResponseAsync(HttpStatusCode statusCode, HttpResponseMessage response, IEnumerable`1 handlers)
   at Docker.DotNet.DockerClient.MakeRequestAsync(IEnumerable`1 errorHandlers, HttpMethod method, String path, IQueryString queryString, IRequestContent body, IDictionary`2 headers, TimeSpan timeout, CancellationToken token)
   at Docker.DotNet.ContainerOperations.CreateContainerAsync(CreateContainerParameters parameters, CancellationToken cancellationToken)
   at DotNet.Testcontainers.Clients.DockerContainerOperations.RunAsync(IContainerConfiguration configuration, CancellationToken ct) in /_/src/Testcontainers/Clients/DockerContainerOperations.cs:line 213
   at DotNet.Testcontainers.Clients.TestcontainersClient.RunAsync(IContainerConfiguration configuration, CancellationToken ct) in /_/src/Testcontainers/Clients/TestcontainersClient.cs:line 307
   at DotNet.Testcontainers.Containers.DockerContainer.UnsafeCreateAsync(CancellationToken ct) in /_/src/Testcontainers/Containers/DockerContainer.cs:line 413
   at DotNet.Testcontainers.Containers.DockerContainer.StartAsync(CancellationToken ct) in /_/src/Testcontainers/Containers/DockerContainer.cs:line 277
   at DotNet.Testcontainers.Containers.ResourceReaper.GetAndStartNewAsync(Guid sessionId, IDockerEndpointAuthenticationConfiguration dockerEndpointAuthConfig, IImage resourceReaperImage, IMount dockerSocket, ILogger logger, Boolean requiresPrivilegedMode, TimeSpan initTimeout, CancellationToken ct) in /_/src/Testcontainers/Containers/ResourceReaper.cs:line 219
   at DotNet.Testcontainers.Containers.ResourceReaper.GetAndStartNewAsync(Guid sessionId, IDockerEndpointAuthenticationConfiguration dockerEndpointAuthConfig, IImage resourceReaperImage, IMount dockerSocket, ILogger logger, Boolean requiresPrivilegedMode, TimeSpan initTimeout, CancellationToken ct) in /_/src/Testcontainers/Containers/ResourceReaper.cs:line 243
   at DotNet.Testcontainers.Containers.ResourceReaper.GetAndStartDefaultAsync(IDockerEndpointAuthenticationConfiguration dockerEndpointAuthConfig, ILogger logger, Boolean isWindowsEngineEnabled, CancellationToken ct) in /_/src/Testcontainers/Containers/ResourceReaper.cs:line 135
   at DotNet.Testcontainers.Clients.TestcontainersClient.RunAsync(IContainerConfiguration configuration, CancellationToken ct) in /_/src/Testcontainers/Clients/TestcontainersClient.cs:line 294
   at DotNet.Testcontainers.Containers.DockerContainer.UnsafeCreateAsync(CancellationToken ct) in /_/src/Testcontainers/Containers/DockerContainer.cs:line 413
   at DotNet.Testcontainers.Containers.DockerContainer.StartAsync(CancellationToken ct) in /_/src/Testcontainers/Containers/DockerContainer.cs:line 277
[entry from user code]

Additional information

I was not able to reproduce this locally directly, but while inspecting the code I noticed that ResourceReaper.DisposeAsync() is not synchronized with ResourceReaper.GetAndStartDefaultAsync(...). This leads to a race condition: When DisposeAsync has already set _disposed = true, but not yet removed the container, then calling GetAndStartDefaultAsync produces the above exception. I could verify this "manually" by applying the following diff

diff --git a/src/Testcontainers/Containers/ResourceReaper.cs b/src/Testcontainers/Containers/ResourceReaper.cs
--- a/src/Testcontainers/Containers/ResourceReaper.cs   (revision 934d7f0c173253bb2bc9baddc4c9e41560ab13c9)
+++ b/src/Testcontainers/Containers/ResourceReaper.cs   (date 1725452154519)
@@ -167,11 +167,11 @@
         _maintainConnectionCts.Dispose();
       }

-      if (_resourceReaperContainer != null)
-      {
-        await _resourceReaperContainer.DisposeAsync()
-          .ConfigureAwait(false);
-      }
+      // if (_resourceReaperContainer != null)
+      // {
+      //   await _resourceReaperContainer.DisposeAsync()
+      //     .ConfigureAwait(false);
+      // }
     }

     /// <summary>

and then executing this test:

    [Fact]
    public async Task ResourceReaperShouldCompletelyCleanUp()
    {
      ResourceReaper reaper =
        await ResourceReaper.GetAndStartDefaultAsync(TestcontainersSettings.OS.DockerEndpointAuthConfig,
          ConsoleLogger.Instance);
      await reaper.DisposeAsync();

      ResourceReaper reaper2 =
        await ResourceReaper.GetAndStartDefaultAsync(TestcontainersSettings.OS.DockerEndpointAuthConfig,
          ConsoleLogger.Instance);
    }
devboost-ska commented 2 months ago

My proposal would be add locking around DisposeAsync, similar to as it is done in GetAndStartDefaultAsync. I would be happy to contribute a PR. Any concerns?

HofmeisterAn commented 2 months ago

I am wondering how you are running into this issue. What causes the singleton instance to be disposed of 🤔? The default resource reaper instance should be instantiated once and never disposed of. Perhaps it is failing the instantiation of the singleton instance? Is there any other exception?

devboost-ska commented 2 months ago

Thanks for getting back to me!

Seems like you are right as well. Previous to the above exception there are multiple test cases which failed during setup with

OneTimeSetUp: System.TimeoutException : The operation has timed out.
[...]
     at System.IO.Pipes.NamedPipeClientStream.ConnectInternal(Int32 timeout, CancellationToken cancellationToken, Int32 startTime)
   at System.IO.Pipes.NamedPipeClientStream.<>c.<ConnectAsync>b__21_0(Object state)
   at System.Threading.Tasks.Task.InnerInvoke()
   at System.Threading.Tasks.Task.<>c.<.cctor>b__281_0(Object obj)
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location ---
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
--- End of stack trace from previous location ---
   at Docker.DotNet.DockerClient.<>c__DisplayClass6_0.<<-ctor>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at Microsoft.Net.Http.Client.ManagedHandler.ProcessRequestAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at Microsoft.Net.Http.Client.ManagedHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
   at Docker.DotNet.DockerClient.PrivateMakeRequestAsync(TimeSpan timeout, HttpCompletionOption completionOption, HttpMethod method, String path, IQueryString queryString, IDictionary`2 headers, IRequestContent data, CancellationToken cancellationToken)
   at Docker.DotNet.DockerClient.MakeRequestAsync(IEnumerable`1 errorHandlers, HttpMethod method, String path, IQueryString queryString, IRequestContent body, IDictionary`2 headers, TimeSpan timeout, CancellationToken token)
   at Docker.DotNet.ContainerOperations.InspectContainerAsync(String id, CancellationToken cancellationToken)
   at DotNet.Testcontainers.Clients.DockerContainerOperations.ByIdAsync(String id, CancellationToken ct) in /_/src/Testcontainers/Clients/DockerContainerOperations.cs:line 38
   at DotNet.Testcontainers.Containers.DockerContainer.UnsafeCreateAsync(CancellationToken ct) in /_/src/Testcontainers/Containers/DockerContainer.cs:line 417
   at DotNet.Testcontainers.Containers.DockerContainer.StartAsync(CancellationToken ct) in /_/src/Testcontainers/Containers/DockerContainer.cs:line 277
   at DotNet.Testcontainers.Containers.ResourceReaper.GetAndStartNewAsync(Guid sessionId, IDockerEndpointAuthenticationConfiguration dockerEndpointAuthConfig, IImage resourceReaperImage, IMount dockerSocket, ILogger logger, Boolean requiresPrivilegedMode, TimeSpan initTimeout, CancellationToken ct) in /_/src/Testcontainers/Containers/ResourceReaper.cs:line 219
   at DotNet.Testcontainers.Containers.ResourceReaper.GetAndStartNewAsync(Guid sessionId, IDockerEndpointAuthenticationConfiguration dockerEndpointAuthConfig, IImage resourceReaperImage, IMount dockerSocket, ILogger logger, Boolean requiresPrivilegedMode, TimeSpan initTimeout, CancellationToken ct) in /_/src/Testcontainers/Containers/ResourceReaper.cs:line 243
   at DotNet.Testcontainers.Containers.ResourceReaper.GetAndStartDefaultAsync(IDockerEndpointAuthenticationConfiguration dockerEndpointAuthConfig, ILogger logger, Boolean isWindowsEngineEnabled, CancellationToken ct) in /_/src/Testcontainers/Containers/ResourceReaper.cs:line 135
   at DotNet.Testcontainers.Clients.TestcontainersClient.RunAsync(IContainerConfiguration configuration, CancellationToken ct) in /_/src/Testcontainers/Clients/TestcontainersClient.cs:line 294
   at DotNet.Testcontainers.Containers.DockerContainer.UnsafeCreateAsync(CancellationToken ct) in /_/src/Testcontainers/Containers/DockerContainer.cs:line 413
   at DotNet.Testcontainers.Containers.DockerContainer.StartAsync(CancellationToken ct) in /_/src/Testcontainers/Containers/DockerContainer.cs:line 277
   [test setup code]

I assume that even though the commands timed out, one of them actually started the resource reaper container. So synchronizing the disposal will probably not help.

HofmeisterAn commented 2 months ago

I assume that even though the commands timed out, one of them actually started the resource reaper container.

I doubt it. If TC successfully starts a resource reaper instance, there is no reason to create it again. As I mentioned, there is no need to dispose of the default instance; we never call the dispose method except in the mentioned error case (perhaps we should throw a different exception if someone tries to dispose of the default instance). The resource reaper runs longer than the test process and cleans up after itself. This is crucial to prevent resource leaks. Changing this will very likely hide the underlying issue and root cause.

The TimeoutException looks really odd. I think the first exception would be really interesting. I've never seen this behavior before.

HofmeisterAn commented 2 months ago

Please try to run the resource reaper (Ryuk) manual on your build agent and check if it fails:

docker run -v /var/run/docker.sock:/var/run/docker.sock -e RYUK_PORT=8080 -p 8080 testcontainers/ryuk:0.9.0
devboost-ska commented 2 months ago
docker run -v /var/run/docker.sock:/var/run/docker.sock -e RYUK_PORT=8080 -p 8080 testcontainers/ryuk:0.9.0
Unable to find image 'testcontainers/ryuk:0.9.0' locally
0.9.0: Pulling from testcontainers/ryuk
46b060cc2620: Pull complete
950af9946849: Pull complete
dce2d503360a: Pull complete
Digest: sha256:448beed1b3fd18e9411dd4b6a26a04f3aa0fccf229502c9665ebe8d628c7d2c5
Status: Downloaded newer image for testcontainers/ryuk:0.9.0
2024/09/05 08:59:47 Pinging Docker...
2024/09/05 08:59:47 Docker daemon is available!
2024/09/05 08:59:47 Starting on port 8080...
2024/09/05 08:59:47 Started!
2024/09/05 08:59:58 Signal received
2024/09/05 08:59:58 Removed 0 container(s), 0 network(s), 0 volume(s) 0 image(s) 

Worked without problem. The test runs usually seem to use testcontainers/ryuk:0.6.0 though, in case that makes a difference.

devboost-ska commented 2 months ago

If TC successfully starts a resource reaper instance, there is no reason to create it again.

My hypothesis though is that the docker client times out, reporting to TC that container start failed, while the docker server was just to slow to start, but actually did manage to start. That way TC cannot know about the started container, and would try to start again. Could it happen that way?

HofmeisterAn commented 2 months ago

Worked without problem.

👍 That looks good. The version does not matter.

while the docker server was just to slow to start [...] Could it happen that way?

Does the service start while running the build or test? Ephemeral agent? Can you ensure the service is in a ready state before running the tests? It does not even start the container; it fails just by trying to create the resource (aka docker container create).

The default timeout for Docker.DotNet Npipe connection appears quite small, although I have never experienced any issues before (and TC is initially able to connect to it; otherwise, you would see different errors).

You can try passing a custom endpoint authentication provider to the builder and increase the timeout to see if that resolves the issue. For example:

public sealed class CustomEndpointAuthProvider : IDockerEndpointAuthenticationConfiguration
{
    private CustomEndpointAuthProvider()
    {
    }

    public static IDockerEndpointAuthenticationConfiguration Instance { get; }
        = new CustomEndpointAuthProvider();

    public Credentials Credentials
        => null;

    public Uri Endpoint
        => new Uri("npipe://./pipe/docker_engine");

    public DockerClientConfiguration GetDockerClientConfiguration(Guid sessionId = default)
    {
        return new DockerClientConfiguration(Endpoint, Credentials, namedPipeConnectTimeout: TimeSpan.FromSeconds(10));
    }
}
public sealed class GitHub
{
    static GitHub()
    {
        // Because the endpoint uses the same address as the default configuration, we need
        // to override the selected auto-discovery endpoint. Otherwise, we will be using
        // the default (cached) provider instead of the custom one.
        // It is important to override it before any builder is instantiated.
        TestcontainersSettings.OS = new Windows(CustomEndpointAuthProvider.Instance);
    }

    [Fact]
    public async Task Issue1252()
    {
        _ = new ContainerBuilder().WithImage(CommonImages.Alpine).Build();
    }
}
benjaminsampica commented 1 month ago

We've started seeing this on Microsoft-hosted build agents with the SQL Server container specifically. I believe the embedded image tag is no longer supported in some way, so a .WithImage() is required.

FrancescoValletti commented 1 month ago

Hi,

if it can be useful to confirm what the guy @benjaminsampica said, I struggled all morning with a similar problem, but I using MsSqlBuilder.

on the local environment everything worked perfectly, and until a few days ago also on Microsoft-hosted build agents.

Since yesterday it stopped working on agents during pipelines

> Docker.DotNet.DockerApiException : Docker API responded with status code=Conflict, response={"message":"container 7c1d7fcc07091720d964c9d75e699cc0530e02b8a6a212b90bb60875e914d035 is not running"}

adding WithImage() to the builder fixed the problem

HofmeisterAn commented 1 month ago

Both mentioned MSSQL issues are not related to this issue. Both of you are running into: https://github.com/testcontainers/testcontainers-dotnet/pull/1265.

HofmeisterAn commented 1 month ago

Without additional information, I am unable to help. I will close the issue for now. Please refer to my comment above, and do not hesitate to reopen the issue if you have further information.