testcontainers / testcontainers-python

Testcontainers is a Python library that providing a friendly API to run Docker container. It is designed to create runtime environment to use during your automatic tests.
https://testcontainers-python.readthedocs.io/en/latest/
Apache License 2.0
1.44k stars 270 forks source link

Bug: You have to remove (or rename) that container to be able to reuse that name. (starting ryuk twice, dind) #597

Open tharwan opened 1 month ago

tharwan commented 1 month ago

Describe the bug

When we use testcontainer in our build pipeline we get the following error:

Conflict ("Conflict. The container name "/testcontainers-ryuk-d5f13be0-c6ab-43c0-83ad-071f3604cb77" is already in use by container "0f610da3c88a6f4ff0d46c71b55219b08db90dd699aa75a990f041e0bccf83b6". You have to remove (or rename) that container to be able to reuse that name.")

To Reproduce

I am not sure what exactly is the cause for that problem, locally everything runs just fine.

Runtime environment

testcontainers 4.5.0 pytest 6.2.5

alexanderankin commented 1 month ago

a workaround for this would be to disable ryuk with TESTCONTAINERS_RYUK_DISABLED, but this is not a known issue, so perhaps if there is anything else different about your build environment that you can share to help debug this, that would be helpful. for example, if tests are running in parallel there

tharwan commented 1 month ago

Thanks for the quick reply @alexanderankin!

I already tried to set TESTCONTAINERS_RYUK_DISABLED but without any effect, I will check again, maybe I was just too tired and overlooked something.

regarding our environment: the build pipeline runs on azure dev ops with an on premise agent. I believe also with rootless docker. I saw this is also a known problem so I will confirm if this is the case. Apart from that I think there is nothing unusual.

tharwan commented 1 month ago

I am not certain what was the root cause for the error to appear in the first place. If I run a docker ps -a I get nothing.

However, since the pipeline is configured in yaml, and yaml treats some strings as bool, I end up with the environment variable being TESTCONTAINERS_RYUK_DISABLED=True instead of "true". That is why I can't the the workaround to work.

Maybe it would be good to have some normalisation here like .lower()

tharwan commented 1 month ago

Okay, it took use quite some time to get to the bottom of this, because debugging things that only happen in build pipelines is a nightmare.

but it appears to be a DinD issue #517

This is the first time we looked into the whole setup inside the pipeline so deeply, so now we are wondering why it ever worked with previous versions of testcontainers (and more specifically what the workaround was).

In essence „our“ code runs in a container that gets starters from the pipeline agent, and that gets assigned its own docker network. When we then start a testcontainer it gets connected to the default bridge network, and the two can’t talk to each other.

alexanderankin commented 1 month ago

yeah so the workaround for dind is basically this code idea here - https://github.com/testcontainers/testcontainers-python/issues/475#issuecomment-2040182302 - we commented it out here - https://github.com/testcontainers/testcontainers-python/pull/388/files#r1366799006 - and this is what folks are seeing with DinD setup.

I am thinking of just undoing it, I don't have the time to really figure out how to resolve or how to even test tc.host to ensure the/a potential fix works for both. youll note that the workaround mirrors the PR that changes it.

tharwan commented 1 month ago

What took us much longer to figure out was the fact that the two containers are on different networks. And that also seems to influence the port. So if dind and they are on the same network that the internal port, but otherwise the published port.

So the workaround in the old code is not complete for our case, but we could fix it by always using the puplished port, because local would not be dind.