testcontainers / testcontainers-dotnet-legacy

A .net fork of testcontainers - in early development
MIT License
165 stars 18 forks source link

Timeout occurs waiting for container to start #68

Open emzyme20 opened 5 years ago

emzyme20 commented 5 years ago

Describe the bug I have version 0.0.2.107 of testcontainers (.NET) referenced in my project. The tests are executing locally, although a little slow. When the same code is pushed up to our build server, we are seeing timeout errors when we are trying to start the container.

Is there a setting we could configure to increase the timeout during configuration of the container?

My setup code is as follows:

this.Container = new DatabaseContainerBuilder<PostgreSqlContainer>()
                .Begin()
                .WithImage($"{PostgreSqlContainer.IMAGE}:{PostgreSqlContainer.DEFAULT_TAG}")
                .WithExposedPorts(PostgreSqlContainer.POSTGRESQL_PORT)
                .WithEnv(("POSTGRES_PASSWORD", "Password123"))
                .Build();

The error that I get in the build log:

System.Exception : The delegate executed asynchronously through TimeoutPolicy did not complete within the timeout.
---- Polly.Timeout.TimeoutRejectedException : The delegate executed asynchronously through TimeoutPolicy did not complete within the timeout.
-------- System.Threading.Tasks.TaskCanceledException : A task was canceled.
   at TestContainers.Core.Containers.PostgreSqlContainer.<WaitUntilContainerStarted>d__16.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at TestContainers.Core.Containers.Container.<TryStart>d__41.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at TestContainers.Core.Containers.Container.<Start>d__40.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
----- Inner Stack Trace -----
   at Polly.Timeout.TimeoutEngine.<ImplementationAsync>d__1`1.MoveNext() in C:\projects\polly\src\Polly.Shared\Timeout\TimeoutEngineAsync.cs:line 62
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Polly.Policy.<ExecuteAsyncInternal>d__180.MoveNext() in C:\projects\polly\src\Polly.Shared\Policy.Async.cs:line 41
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Polly.Wrap.PolicyWrapEngine.<ImplementationAsync>d__9.MoveNext() in C:\projects\polly\src\Polly.Shared\Wrap\PolicyWrapEngineAsync.cs:line 101
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Polly.Policy.<ExecuteAsyncInternal>d__180.MoveNext() in C:\projects\polly\src\Polly.Shared\Policy.Async.cs:line 41
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Polly.Policy.<ExecuteAndCaptureAsync>d__130.MoveNext() in C:\projects\polly\src\Polly.Shared\Policy.Async.ExecuteOverloads.cs:line 378
----- Inner Stack Trace -----
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Polly.Retry.RetryStateWaitAndRetryForever`1.<CanRetryAsync>d__8.MoveNext() in C:\projects\polly\src\Polly.Shared\Retry\RetryStateWaitAndRetryForeverAsync.cs:line 30
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Polly.Retry.RetryEngine.<ImplementationAsync>d__1`1.MoveNext() in C:\projects\polly\src\Polly.Shared\Retry\RetryEngineAsync.cs:line 57
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Polly.Policy.<ExecuteAsyncInternal>d__180.MoveNext() in C:\projects\polly\src\Polly.Shared\Policy.Async.cs:line 41
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Polly.Wrap.PolicyWrapEngine.<>c__DisplayClass9_0.<<ImplementationAsync>b__0>d.MoveNext() in C:\projects\polly\src\Polly.Shared\Wrap\PolicyWrapEngineAsync.cs:line 102
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Polly.Policy.<>c__DisplayClass213_1.<<TimeoutAsync>b__1>d.MoveNext() in C:\projects\polly\src\Polly.Shared\Timeout\TimeoutSyntaxAsync.cs:line 256
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Polly.Timeout.TimeoutEngine.<ImplementationAsync>d__1`1.MoveNext() in C:\projects\polly\src\Polly.Shared\Timeout\TimeoutEngineAsync.cs:line 34

Desktop (please complete the following information):

isen-ng commented 5 years ago

Does it always timeout on your build server? It may be because your build server has run out of disk space.

If you are able to log into your build server, you can test for this by manually starting a docker container.

eg

docker run -dit --name=my_test postgres:11-alpine

The container should continue to run indefinitely until you terminate it. If it exits immediately, you can still check it's logs by

docker logs my_test

It might say that something along the lines, "no space left on device". If that's the case, you'll need to clean up orphan docker images and orphan docker volumes, or simply destroy and recreate your VM.

However, if the container starts successfully, the problem is somewhere else. Postgres should not take more than 1 minute to boot up.

I've experienced this many times on my local machine. The problem goes away after I clean orphan images and volumes.

emzyme20 commented 5 years ago

We spin up new VM's in Azure on triggering a new build so they are created from an image that we have customised meaning there is no temporary data anywhere.. I've logged into one of the instances and verified that docker can run the line that you mentioned:

image

isen-ng commented 5 years ago

Something in your screenshot concerns me. After you run docker run -dit --name=my_test postgres:11-alpine, and docker container ls, there are no containers listed. This means that the container you just started, stopped immediately.

You should run docker logs my_test to check for the reason why the container stopped immediately.

This is likely the cause of your woes. Since the newly created container stopped immediately, waiting for the container's service to start will timeout.

emzyme20 commented 5 years ago

I think this version of Docker EE preview might be the issue here... I need linux containers to be able to run postgres images but I can't run Docker desktop on our build server, it needs Docker EE. However, to run linux containers, you need to install the preview and set it up in experimental mode. The command line does not respond well so I think the container is there but like you say, there may be something else going on..

image

I am now seeing this in the event viewer on our build agent.

image

The container id's are different because I guess the unit tests are attempting to spin up containers but something is not working..

The following errors lead me to believe that this will never work on a windows server environment:

image

image

Thanks for all your help but I think this is probably something that I will need to come back to if docker EE ever starts to support linux containers.

swissarmykirpan commented 5 years ago

@emzyme20 do you need to run windows + linux containers at the same time? If so you should be able to use LCOW - https://docs.microsoft.com/en-us/virtualization/windowscontainers/deploy-containers/linux-containers

emzyme20 commented 5 years ago

When we tried running docker client on the Windows Server box, everytime it launched the VM and attempted to run docker, it would pop up a warning message that a user would need to click ok on to then allow docker to continue startup. This was not a solution for us as Azure deallocates the boxes when a build is not required to run. So this was happening every time a build was initiated.

We opted to then uninstall and install Docker EE which can run as a service but I then had to install all the preview releases to get it to run linux containers.

https://www.altaro.com/msp-dojo/linux-containers-windows-server-2019/

I do have the environment variable set for LCOW. I think the issue is that the preview version of docker EE that we have installed is potentially incomplete or buggy?

isen-ng commented 5 years ago

I may be wrong as I have not used docker EE, nor have I used it on a windows environment before.

The container name "/my_test" is already in use by container ...

This doesn't mean the container is running. There is a container with the name "/my_test" and it may be stopped (because of some error). To see all containers, including stopped ones, try this command,

docker ps -a

Does looking for logs for the container return anything?

docker logs my_test
emzyme20 commented 5 years ago

Ah yes, docker ps -a does return containers... It returns the original hello world test that I created when I installed it.. Stupidly I did not remove it before imaging the server so I think it's there permanently now.... I am not sure what the postgres container is that was created an hour ago either - I can't seem to delete that one anymore either (it could be from a failed unit test). I can see new containers spinning up, so that indicates that the TestContainers code is trying to create them.

I can see logs from the my_test container as well but I am not sure what the issues are that it is complaining about.

image

emzyme20 commented 5 years ago

Could it be an issue with the version of the postgres image that you need? 9.6.8 seems quite old?

isen-ng commented 5 years ago

Regarding the problem where you cannot remove a container, the docker kill command forcefully stops a container. Since the container you are trying to kill, is never started as it is in the Created state, it cannot be killed, and docker throws you an error saying, "container is not running", which is expected.

You can forcefully remove a (stopped or running) container using this command,

docker rm -f lucid_beaver

Regarding the new postgres container you started, the problem is not that the postgres version is too old. The problem is that the volume it is trying to write to, does not allow your docker service to write to it. This could be due to multiple reasons.

Is this server a VM, or a docker image?

If this server is a docker image, has the volume attached to be the data folder of the docker service mounted as RW? It could be mounted as RO only.

If this server is a VM, is your /var/lib/ write protected? It could be because the volume mounted is in read only mode (ro), or it could be because the user running the docker service does not have permissions to write to /var/lib.


Anyhow, I now think this issue you're facing is towards setting up your build VM/image to work properly. I don't think this is a TestContainer issue anymore, because running a docker image directly using the docker CLI results in a failure.

isen-ng commented 5 years ago

Since you're running docker EE, are you paying for this enterprise license? If yes, I highly suggest you ask for enterprise support from docker about this (esp this docker EE preview problems you're facing)

emzyme20 commented 5 years ago

We didn't necessarily want to install docker EE, it was suggested as the only way to run docker on windows server (which is our build machines in Azure). This is just a trial at the moment to see if we can get it working as installing the docker client for windows did not work on windows server.

The machine is a VM and as it's windows we don't have the path /var/lib (unless I am missing something about how docker linux containers work on windows?)

I run linux containers locally on Windows 10 using docker for windows and I don't have a var/lib folder either..

isen-ng commented 5 years ago

Ah ha. A Windows server!

I'm not exactly familiar with how docker's files' layout on Windows but you should be able to get more information on the volumes you're using for your container.

For example, you can try to do this:

# this will show your all the information about a container
docker container inspect <your postgres container>

Look under mounts:

        "Mounts": [
            {
                "Type": "volume",
                "Name": "636d7f8fa48877406d21859a2d66b6248de2b57696829b0d8591a0fa49611789",
                "Source": "/var/lib/docker/volumes/636d7f8fa48877406d21859a2d66b6248de2b57696829b0d8591a0fa49611789/_data",
                "Destination": "/var/lib/postgresql/data",
                "Driver": "local",
                "Mode": "",
                "RW": true,
                "Propagation": ""
            }
        ],

Things to check


I'm starting to think that this is a Windows/docker/postgres compatibility issue. See,

There are more issues when I search "windows docker postgres fix permissions". 1 of them may help solve your issue. I am out of my league here as I don't have a Windows machine to test...

emzyme20 commented 5 years ago

Yes me too (think it's a docker issue) I appreciate your help though as I am very new to using docker personally too. It's never as straightforward as you think trying to replicate something that works local on a build server (which is then a completely different set up).

I am happy for you to close this issue. I'll try again to work through my docker issues :/

HofmeisterAn commented 5 years ago

Have you notice the section Bind mounts in the article @swissarmykirpan provided?

These applications all require volume mapping and will not start or run correctly.

  • MySQL
  • PostgreSQL
  • WordPress
  • Jenkins
  • MariaDB
  • RabbitMQ
swissarmykirpan commented 5 years ago

@emzyme20 can you join the slack channel please - http://slack.testcontainers.org/