Closed ronshemerws closed 2 months ago
I am also experiencing this issue and have been able to reproduce it. It appears to be related to the PostStartLifecycleHook. The Kafka module uses the PostStart LifecycleHook to get the mapped port, which is used to create the Kafka startup script. However, when PostStart is called, the Docker container has not yet resolved the port mapping, causing the port not found error.
While I am not certain if there is a straightforward solution, I have found a less elegant solution to resolve this issue. Simply adding a sleep at the beginning of the Kafka PostStart hook works fine in my tests, and we can make it configurable as needed.
I will suggest a PR for review.
@mdelapenya, do you know of a way to wait until Docker resolves the port mapping before the PostStart hook?
@wilsouza , with adding a delay, did the problem go away completely for you?
My colleague @grevend is also running into this issue, on a Mac (for Linux I cannot reproduce it). And we both tried adding delays, but also retries, and neither worked consistently.
So we are now wondering if its really the same issue, or slightly different.
If we look into the response from the MappedPort
request it's sometimes correct, but other times not having any data in the NetworkSettings.Ports
section of the Docker inspect
command.
Note we're still debugging further, seeing if we can find something to make it work more consistently, but so far it either works, or not (delays etc having no positive effect).
I experience the same behaviour with the Redpanda module. Also tried older versions of testcontainers with the same result. Recently updated to the latest Docker Desktop.
Thank you for sharing the details of the problems you're encountering.
@wilsouza , with adding a delay, did the problem go away completely for you?
Yes, adding a delay has completely resolved the issue for me. I'm running an internal version with this solution and haven't experienced any problems. Iβve also tested it on my Mac with an M1 processor without issues.
My colleague @grevend is also running into this issue, on a Mac (for Linux I cannot reproduce it). And we both tried adding delays, but also retries, and neither worked consistently.
So we are now wondering if its really the same issue, or slightly different. If we look into the response from the
MappedPort
request it's sometimes correct, but other times not having any data in theNetworkSettings.Ports
section of the Dockerinspect
command.Note we're still debugging further, seeing if we can find something to make it work more consistently, but so far it either works, or not (delays etc having no positive effect).
Have you tried running version #2552 with a custom wait time? Could you try running it with a custom timeout to see if it resolves the issue on your end?
@wilsouza we tried with waits, unfortunately though, the wait needs to be pretty long to be stable (3-5 seconds). What makes that unacceptable from my perspective, is that the wait isn't needed on Linux based systems. Even if for a second, it would add a second to every test run, for every time the Kafka container needs to be started.
We also further looked into why some retry mechanism doesn't work and found the cause. MappedPort
calls the Inspect
function, which always caches the first result it got:
https://github.com/testcontainers/testcontainers-go/blob/d4a21ea92ee84c058c3ad189aa328b9f5229807e/docker.go#L182
https://github.com/testcontainers/testcontainers-go/blob/d4a21ea92ee84c058c3ad189aa328b9f5229807e/docker.go#L169
Which is then the reason why a retry doesn't work (if it once gets a wrong response, it always gets the wrong response), and a initial delay somehow works as it gives Docker a chance to set up the port mapping.
I'm happy to provide assistance to any solution, but would really like to hear @mdelapenya 's perspective. Because I'm wondering if caching that Inspect
response is the right thing to do. As I guess most fields would be stable, but next to the network there might be some other fields that change such as the State
ones? Always caching would give a high potential of returning wrong info.
And of course the question remains why the port mapping has some delay in the first place for MacOS, but maybe that's due to the nature of how Docker works there.
And of course the question remains why the port mapping has some delay in the first place for MacOS, but maybe that's due to the nature of how Docker works there.
Maybe it relates to the fact MacOS uses a Virtual Machine to host docker containers π€
Maybe the port is mapped in the VM, but not in the MacOS host machine.
For the record, I have this behaviour also on a linux machine running ubuntu:
Server Version: 26.1.4
API Version: 1.44
Operating System: Ubuntu 22.04.4 LTS
Total Memory: 7771 MB
Resolved Docker Host: unix:///var/run/docker.sock
Resolved Docker Socket Path: /var/run/docker.sock
Test SessionID: 109ffb6f0068199f8347d3954fd3f1fb31f80a5ffc016cd4ea231e7a3193510d
Test ProcessID: 5dcbcb80-8058-4b5a-a12b-37d4d47fd291
[...]
kafka_test.go:13:
Error Trace: [...]/kafka_test.go:13
Error: Received unexpected error:
failed to start container: context deadline exceeded
any update on this?
Got hit with this bug today on my Mac. Happens randomly. I'm using the latest testcontainers version.
I think that this issue can be closed now that it has been addressed in @mdelapenya PR, #2606. What do you think @mdelapenya? Iβm unable to validate if itβs still occurring at this time.
This issue seem to be resolved for me when upgraded to v0.32.0. I will close this now, thanks to everyone that was involved, this is much appreciated!
thanks folks for your support! I'm currently at Gophercon, and I will go back to regular work next Monday π
Testcontainers version
0.31.0
Using the latest Testcontainers version?
Yes
Host OS
Mac
Host arch
ARM
Go version
1.22.3
Docker version
Docker info
What happened?
I followed the documentation on how to use kafka testcontainers and at first it worked randomly (got the container running), and now it does not work constantly ending up with the "port not found" error after waiting the 1m timeout.
Relevant log output
Additional information
This is the complete program I am running to reproduce (started an empty project):
Here is the mod file