Closed devboost-ska closed 1 month ago
My proposal would be add locking around DisposeAsync
, similar to as it is done in GetAndStartDefaultAsync
. I would be happy to contribute a PR. Any concerns?
I am wondering how you are running into this issue. What causes the singleton instance to be disposed of 🤔? The default resource reaper instance should be instantiated once and never disposed of. Perhaps it is failing the instantiation of the singleton instance? Is there any other exception?
Thanks for getting back to me!
Seems like you are right as well. Previous to the above exception there are multiple test cases which failed during setup with
OneTimeSetUp: System.TimeoutException : The operation has timed out.
[...]
at System.IO.Pipes.NamedPipeClientStream.ConnectInternal(Int32 timeout, CancellationToken cancellationToken, Int32 startTime)
at System.IO.Pipes.NamedPipeClientStream.<>c.<ConnectAsync>b__21_0(Object state)
at System.Threading.Tasks.Task.InnerInvoke()
at System.Threading.Tasks.Task.<>c.<.cctor>b__281_0(Object obj)
at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location ---
at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
--- End of stack trace from previous location ---
at Docker.DotNet.DockerClient.<>c__DisplayClass6_0.<<-ctor>b__0>d.MoveNext()
--- End of stack trace from previous location ---
at Microsoft.Net.Http.Client.ManagedHandler.ProcessRequestAsync(HttpRequestMessage request, CancellationToken cancellationToken)
at Microsoft.Net.Http.Client.ManagedHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
at Docker.DotNet.DockerClient.PrivateMakeRequestAsync(TimeSpan timeout, HttpCompletionOption completionOption, HttpMethod method, String path, IQueryString queryString, IDictionary`2 headers, IRequestContent data, CancellationToken cancellationToken)
at Docker.DotNet.DockerClient.MakeRequestAsync(IEnumerable`1 errorHandlers, HttpMethod method, String path, IQueryString queryString, IRequestContent body, IDictionary`2 headers, TimeSpan timeout, CancellationToken token)
at Docker.DotNet.ContainerOperations.InspectContainerAsync(String id, CancellationToken cancellationToken)
at DotNet.Testcontainers.Clients.DockerContainerOperations.ByIdAsync(String id, CancellationToken ct) in /_/src/Testcontainers/Clients/DockerContainerOperations.cs:line 38
at DotNet.Testcontainers.Containers.DockerContainer.UnsafeCreateAsync(CancellationToken ct) in /_/src/Testcontainers/Containers/DockerContainer.cs:line 417
at DotNet.Testcontainers.Containers.DockerContainer.StartAsync(CancellationToken ct) in /_/src/Testcontainers/Containers/DockerContainer.cs:line 277
at DotNet.Testcontainers.Containers.ResourceReaper.GetAndStartNewAsync(Guid sessionId, IDockerEndpointAuthenticationConfiguration dockerEndpointAuthConfig, IImage resourceReaperImage, IMount dockerSocket, ILogger logger, Boolean requiresPrivilegedMode, TimeSpan initTimeout, CancellationToken ct) in /_/src/Testcontainers/Containers/ResourceReaper.cs:line 219
at DotNet.Testcontainers.Containers.ResourceReaper.GetAndStartNewAsync(Guid sessionId, IDockerEndpointAuthenticationConfiguration dockerEndpointAuthConfig, IImage resourceReaperImage, IMount dockerSocket, ILogger logger, Boolean requiresPrivilegedMode, TimeSpan initTimeout, CancellationToken ct) in /_/src/Testcontainers/Containers/ResourceReaper.cs:line 243
at DotNet.Testcontainers.Containers.ResourceReaper.GetAndStartDefaultAsync(IDockerEndpointAuthenticationConfiguration dockerEndpointAuthConfig, ILogger logger, Boolean isWindowsEngineEnabled, CancellationToken ct) in /_/src/Testcontainers/Containers/ResourceReaper.cs:line 135
at DotNet.Testcontainers.Clients.TestcontainersClient.RunAsync(IContainerConfiguration configuration, CancellationToken ct) in /_/src/Testcontainers/Clients/TestcontainersClient.cs:line 294
at DotNet.Testcontainers.Containers.DockerContainer.UnsafeCreateAsync(CancellationToken ct) in /_/src/Testcontainers/Containers/DockerContainer.cs:line 413
at DotNet.Testcontainers.Containers.DockerContainer.StartAsync(CancellationToken ct) in /_/src/Testcontainers/Containers/DockerContainer.cs:line 277
[test setup code]
I assume that even though the commands timed out, one of them actually started the resource reaper container. So synchronizing the disposal will probably not help.
I assume that even though the commands timed out, one of them actually started the resource reaper container.
I doubt it. If TC successfully starts a resource reaper instance, there is no reason to create it again. As I mentioned, there is no need to dispose of the default instance; we never call the dispose method except in the mentioned error case (perhaps we should throw a different exception if someone tries to dispose of the default instance). The resource reaper runs longer than the test process and cleans up after itself. This is crucial to prevent resource leaks. Changing this will very likely hide the underlying issue and root cause.
The TimeoutException
looks really odd. I think the first exception would be really interesting. I've never seen this behavior before.
Please try to run the resource reaper (Ryuk) manual on your build agent and check if it fails:
docker run -v /var/run/docker.sock:/var/run/docker.sock -e RYUK_PORT=8080 -p 8080 testcontainers/ryuk:0.9.0
docker run -v /var/run/docker.sock:/var/run/docker.sock -e RYUK_PORT=8080 -p 8080 testcontainers/ryuk:0.9.0
Unable to find image 'testcontainers/ryuk:0.9.0' locally
0.9.0: Pulling from testcontainers/ryuk
46b060cc2620: Pull complete
950af9946849: Pull complete
dce2d503360a: Pull complete
Digest: sha256:448beed1b3fd18e9411dd4b6a26a04f3aa0fccf229502c9665ebe8d628c7d2c5
Status: Downloaded newer image for testcontainers/ryuk:0.9.0
2024/09/05 08:59:47 Pinging Docker...
2024/09/05 08:59:47 Docker daemon is available!
2024/09/05 08:59:47 Starting on port 8080...
2024/09/05 08:59:47 Started!
2024/09/05 08:59:58 Signal received
2024/09/05 08:59:58 Removed 0 container(s), 0 network(s), 0 volume(s) 0 image(s)
Worked without problem. The test runs usually seem to use testcontainers/ryuk:0.6.0
though, in case that makes a difference.
If TC successfully starts a resource reaper instance, there is no reason to create it again.
My hypothesis though is that the docker client times out, reporting to TC that container start failed, while the docker server was just to slow to start, but actually did manage to start. That way TC cannot know about the started container, and would try to start again. Could it happen that way?
Worked without problem.
👍 That looks good. The version does not matter.
while the docker server was just to slow to start [...] Could it happen that way?
Does the service start while running the build or test? Ephemeral agent? Can you ensure the service is in a ready state before running the tests? It does not even start the container; it fails just by trying to create the resource (aka docker container create
).
The default timeout for Docker.DotNet Npipe connection appears quite small, although I have never experienced any issues before (and TC is initially able to connect to it; otherwise, you would see different errors).
You can try passing a custom endpoint authentication provider to the builder and increase the timeout to see if that resolves the issue. For example:
public sealed class CustomEndpointAuthProvider : IDockerEndpointAuthenticationConfiguration
{
private CustomEndpointAuthProvider()
{
}
public static IDockerEndpointAuthenticationConfiguration Instance { get; }
= new CustomEndpointAuthProvider();
public Credentials Credentials
=> null;
public Uri Endpoint
=> new Uri("npipe://./pipe/docker_engine");
public DockerClientConfiguration GetDockerClientConfiguration(Guid sessionId = default)
{
return new DockerClientConfiguration(Endpoint, Credentials, namedPipeConnectTimeout: TimeSpan.FromSeconds(10));
}
}
public sealed class GitHub
{
static GitHub()
{
// Because the endpoint uses the same address as the default configuration, we need
// to override the selected auto-discovery endpoint. Otherwise, we will be using
// the default (cached) provider instead of the custom one.
// It is important to override it before any builder is instantiated.
TestcontainersSettings.OS = new Windows(CustomEndpointAuthProvider.Instance);
}
[Fact]
public async Task Issue1252()
{
_ = new ContainerBuilder().WithImage(CommonImages.Alpine).Build();
}
}
We've started seeing this on Microsoft-hosted build agents with the SQL Server container specifically. I believe the embedded image tag is no longer supported in some way, so a .WithImage()
is required.
Hi,
if it can be useful to confirm what the guy @benjaminsampica said, I struggled all morning with a similar problem, but I using MsSqlBuilder.
on the local environment everything worked perfectly, and until a few days ago also on Microsoft-hosted build agents.
Since yesterday it stopped working on agents during pipelines
> Docker.DotNet.DockerApiException : Docker API responded with status code=Conflict, response={"message":"container 7c1d7fcc07091720d964c9d75e699cc0530e02b8a6a212b90bb60875e914d035 is not running"}
adding WithImage()
to the builder fixed the problem
Both mentioned MSSQL issues are not related to this issue. Both of you are running into: https://github.com/testcontainers/testcontainers-dotnet/pull/1265.
Without additional information, I am unable to help. I will close the issue for now. Please refer to my comment above, and do not hesitate to reopen the issue if you have further information.
Testcontainers version
3.9.0
Using the latest Testcontainers version?
No
Host OS
Windows
Host arch
x86
.NET version
8.0.401
Docker version
Docker info
What happened?
While running multiple test stages in parallel on a Jenkins, sometimes a test (and all subsequent requiring a testcontainer) fail with:
I expect this to never happen. The testcontainers library should either correctly handle an existing ryuk container it created, or ensure complete clean-up before starting a new one.
Relevant log output
Additional information
I was not able to reproduce this locally directly, but while inspecting the code I noticed that
ResourceReaper.DisposeAsync()
is not synchronized withResourceReaper.GetAndStartDefaultAsync(...)
. This leads to a race condition: WhenDisposeAsync
has already set_disposed = true
, but not yet removed the container, then callingGetAndStartDefaultAsync
produces the above exception. I could verify this "manually" by applying the following diffand then executing this test: