[Bug]: UntilPortIsAvailable wait strategy takes exactly 1 minute before it succeeds every time, while the port is available much earlier

cvetomir-todorov commented 4 months ago

Testcontainers version

3.8.0

Using the latest Testcontainers version?

Yes

Host OS

Ubuntu 22.04

Host arch

x64

.NET version

8.0.300

Docker version

Client: Docker Engine - Community
 Version:           26.1.3
 API version:       1.45
 Go version:        go1.21.10
 Git commit:        b72abbb
 Built:             Thu May 16 08:33:29 2024
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          26.1.3
  API version:      1.45 (minimum version 1.24)
  Go version:       go1.21.10
  Git commit:       8e96db1
  Built:            Thu May 16 08:33:29 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.32
  GitCommit:        8b3b7ca2e5ce38e8f31a34f35b2b68ceb8470d89
 runc:
  Version:          1.1.12
  GitCommit:        v1.1.12-0-g51d5e94
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Docker info

Client: Docker Engine - Community
 Version:    26.1.3
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.14.0
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.27.0
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 19
  Running: 0
  Paused: 0
  Stopped: 19
 Images: 27
 Server Version: 26.1.3
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: runc io.containerd.runc.v2
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 8b3b7ca2e5ce38e8f31a34f35b2b68ceb8470d89
 runc version: v1.1.12-0-g51d5e94
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.5.0-35-generic
 Operating System: Ubuntu 22.04.4 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 20
 Total Memory: 31.01GiB
 Name: precise
 ID: 6XJP:NK2G:WJNA:LPES:SDKQ:MO4E:Y3D3:FRCL:I2UV:ERKJ:447N:P67Z
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

What happened?

I am starting Cassandra NoSQL database using the following code in my tests:

_network = new NetworkBuilder()
    .WithName("cecochat")
    .Build();

_cassandra = new ContainerBuilder()
    .WithImage("cassandra:4.1.3")
    .WithName("cecochat-test-cassandra0")
    .WithHostname("cassandra0")
    .WithNetwork(_network)
    .WithPortBinding(hostPort: 9042, containerPort: 9042)
    .WithEnvironment(new Dictionary<string, string>
    {
        { "CASSANDRA_SEEDS", "cassandra0" },
        { "CASSANDRA_CLUSTER_NAME", "cecochat" },
        { "CASSANDRA_DC", "Europe" },
        { "CASSANDRA_RACK", "Rack0" },
        { "CASSANDRA_ENDPOINT_SNITCH", "GossipingPropertyFileSnitch" },
        { "CASSANDRA_NUM_TOKENS", "128" },
        { "HEAP_NEWSIZE", "128M" },
        { "MAX_HEAP_SIZE", "512M" }
    })
    .WithWaitStrategy(Wait.ForUnixContainer().UntilPortIsAvailable(9042))
    .Build();

await _cassandra.StartAsync();

It always takes around 58 seconds in order for StartAsync to complete. In the meantime the container has been started and port has been open almost immediately after the call. After I execute the above code I immediately test it using nc -vz -w 1 localhost 9042 which is part of this particular wait strategy implementation. The response is Connection to localhost (127.0.0.1) 9042 port [tcp/*] succeeded!. In the log output below you can see the log from testcontainers. It is stubbornly repeating the commands again and again without detecting that the nc command should actually succeed. I tried running my tests without the wait strategy but the first test fails since the port isn't really open yet.

Relevant log output

[testcontainers.org 00:00:00.06] Connected to Docker:
  Host: unix:///var/run/docker.sock
  Server Version: 26.1.3
  Kernel Version: 6.5.0-35-generic
  API Version: 1.45
  Operating System: Ubuntu 22.04.4 LTS
  Total Memory: 31.01 GB
[testcontainers.org 00:00:00.19] Docker network 1365c50b391f created
[testcontainers.org 00:00:00.25] Docker container a624b25be52f created
[testcontainers.org 00:00:00.27] Start Docker container a624b25be52f
[testcontainers.org 00:00:00.50] Wait for Docker container a624b25be52f to complete readiness checks
[testcontainers.org 00:00:00.50] Docker container a624b25be52f ready
[testcontainers.org 00:00:00.52] Docker container d0c42e1e9903 created
[testcontainers.org 00:00:00.52] Start Docker container d0c42e1e9903
[testcontainers.org 00:00:00.77] Wait for Docker container d0c42e1e9903 to complete readiness checks
[testcontainers.org 00:00:00.77] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:01.83] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:02.89] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:03.96] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:05.03] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:06.10] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:07.18] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:08.24] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:09.30] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:10.37] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:11.43] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:12.48] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:13.54] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:14.62] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:15.68] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:16.77] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:17.85] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:18.93] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:19.99] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:21.07] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:22.13] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:23.19] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:24.28] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:25.35] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:26.42] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:27.48] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:28.57] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:29.63] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:30.70] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:31.77] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:32.84] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:33.92] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:35.00] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:36.08] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:37.16] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:38.23] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:39.33] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:40.41] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:41.48] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:42.54] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:43.61] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:44.68] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:45.76] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:46.83] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:47.92] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:48.98] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:50.03] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:51.09] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:52.16] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:53.22] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:54.30] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:55.35] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:56.45] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:57.54] Execute "/bin/sh -c true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')" at Docker container d0c42e1e9903
[testcontainers.org 00:00:57.59] Docker container d0c42e1e9903 ready
[testcontainers.org 00:00:57.75] Delete Docker container d0c42e1e9903
[testcontainers.org 00:00:58.06] Delete Docker network 1365c50b391f

Additional information

No response

HofmeisterAn commented 4 months ago

Thank you for sharing the issue. Could you please debug into the wait strategy and check the response information (stdout, stderr) in the ExecResult response? This should give us a better understanding of why it fails or does not succeed in the first couple of tries. Please be aware that, in general, it is not a good idea to rely on the port. Usually, ports are available before the actual service is running, which results in flakiness.

cvetomir-todorov commented 4 months ago

@HofmeisterAn thanks for your quick reply. I got the following output repeatedly. The file with the output is 738 lines long, so approximately around 367 times. I added new lines here and there in order to see it without horizontal scrollbars, but bear in mind this was a single line.

OCI runtime exec failed:
exec failed: unable to start container process:
exec: "true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042')":
stat true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042'): 
no such file or directory: unknown

EDIT: When I docker exec -it <container-name> bash and I execute true && (grep -i ':0*2352' /proc/net/tcp* || nc -vz -w 1 localhost 9042 || /bin/bash -c '</dev/tcp/localhost/9042') the result is /proc/net/tcp: 1: 00000000:2352 00000000:0000 0A 00000000:00000000 00:00000000 00000000 999 0 7267910 1 0000000000000000 100 0 0 10 0. So I am not sure at the moment where this error is coming from.

Given the initial text of the error exec failed: unable to start container process: there is probably something incompatible with the way commands are being executed in the container, am I on the right path? If so, how could I fix that?

Or is it related to no such file or directory: unknown? I assume the name should be different though...

HofmeisterAn commented 4 months ago

I immediately test it using nc -vz -w 1 localhost 9042 which is part of this particular wait strategy implementation.

Ensuring I have not misunderstood anything, could you clarify where you ran nc? Was it inside the container or from your host? I briefly checked, and it seems nc is not available in the image (container):

/bin/sh: 1: nc: not found

If you ran it from your host (which I assume you did), then there is no issue. The host port that forwards the connection to the container is likely available much earlier than the port inside the container.

It takes approximately a minute until I see the following log message from the container:

Starting listening for CQL clients on /0.0.0.0:9042

This aligns with what you are observing. I would recommend using the log message wait strategy here or reviewing Java's implementation and aligning with their wait strategy.

cvetomir-todorov commented 4 months ago

@HofmeisterAn yep, you are correct about how I was running nc, since I also saw that it is not present in the image. I thought that if there is a port binding, then checking for the port would be delegated as well, but seems that it is not how it is working 🙂

Mainly I was expecting naively some timeout to play a role since things started working after 1 min. But the idea to sync my expectations with the Cassandra logs didn't occur to me.

I didn't know there is a Java implementation, so I am going to search for it and check it out in order to see if something can be borrowed. Thanks for your invaluable input and know-how about making stuff work with Testcontainers. From my part I think the issue should be closed for now.

cvetomir-todorov commented 4 months ago

@HofmeisterAn I investigated the Java implementation which executes a command against Cassandra. Then I applied a log message wait strategy, as advised, and a new strategy which executes a command against the database (simply checks for the existence of a Cassandra keyspace). Two strategies in succession.

This works fine locally, but when I run the code in Github Actions I get the Cassandra-specific NoHostAvailableException which is self-explanatory. I am using the container.Hostname from within the wait strategy, which based on the logs, resolves to 127.0.0.1. Having read the documentation I thought it is not advisable to use such values. Is there any Github Actions-specific issues related to running containers? Is there a way to troubleshoot this? I couldn't find anything specific in the existing issues in this repo, but if I have missed something, could you let me know?

The code is in this file here: https://github.com/cvetomir-todorov/CecoChat/blob/test-chats-service/source/CecoChat.Chats.Testing/TestContainers.cs

The Github Actions workflow is here: https://github.com/cvetomir-todorov/CecoChat/actions/runs/9269353841 (the error logs could be more easily accessible by viewing the failing step from the bottom)

HofmeisterAn commented 4 months ago

Having read the documentation I thought it is not advisable to use such values.

It is not recommended to use a constant or a fixed value like 127.0.0.1. Depending on the container runtime and configuration, the host may differ. Testcontainers takes care of this by resolving the correct host. For GitHub, 127.0.0.1 is correct.

Is there any Github Actions-specific issues related to running containers?

No. All our tests run on GitHub. Many of my projects run on Azure DevOps, which basically uses the same agents.

Is there a way to troubleshoot this? I couldn't find anything specific in the existing issues in this repo, but if I have missed something, could you let me know?

Your wait strategy configuration looks incorrect. You are overriding the first one with the second. It should be the following configuration instead (chained):

.WithWaitStrategy(Wait.ForUnixContainer()
    .UntilMessageIsLogged("Starting listening for CQL clients on /0.0.0.0:9042")
    .UntilCassandraQueryExecuted(port, localDc))

Please consider a longer timeout as well, to ensure it is not just the slow agent. Starting the container on my beefy machine already takes a minute. The pipeline needs to pull the image too.

If it still fails after these adjustments, I would suggest adding a stopwatch to measure how long it takes and exporting the container logs before disposing the container (to ensure that Cassandra (the service) is really running):

var (stdout, stderr) = await _cassandra.GetLogsAsync();

Furthermore, consider using random host ports to avoid port clashes. You never know which services are running on the build agent (or other machines) and occupying ports in your range.

cvetomir-todorov commented 4 months ago

@HofmeisterAn thanks for sharing the advice:

Getting the logs was really nice to know IMO, although it didn't reveal anything useful for my case 😞
Using a random port is very sensible, but still the error that I get consistently is tried [::1]:46021: SocketException 'Connection refused' as if no one is listening
- I am building WithPortBinding(_cassandraHostPort, cassandraContainerPort)
- stdout spits out Starting listening for CQL clients on /0.0.0.0:9042
- Setting the timeout to 5 mins doesn't change things - the aforementioned message that the container is listening is displayed after about 1 min, with no errors to follow. Yet I fail to get the coveted Docker container <id> ready 😞

Any way to check if the container is actually running instead of just crashed/stopped? When I run the code on my machine I do not get Connection refused but rather Transport endpoint is not connected before eventually it succeeds. I'd suppose stdout/stderr would contain info why the container would crash/stop but it's identical to when I run the code locally...

HofmeisterAn commented 4 months ago

but still the error that I get consistently is tried [::1]:46021: SocketException 'Connection refused'

Connect via IPv4.

cvetomir-todorov commented 4 months ago

@HofmeisterAn not only that solved the issue, but it helped me see a flawed logic about the Cassandra client wrapped I had written. Now the test is finally green. Thanks!

As for the TC API allowing overwriting the wait strategy - isn't a defensive approach better? I mean telling the client code that overwriting it is not OK by throwing an exception for example?

HofmeisterAn commented 4 months ago

I mean telling the client code that overwriting it is not OK by throwing an exception for example?

Do you mean throwing an exception when a wait strategy is already configured? TBH, I never thought about that, but I think it wouldn't work very well with our module approach. Testcontainers' modules are pre-configured following best practice configurations, which includes the wait strategy. There might be cases where developers want to override the default configuration for whatever reason. Maybe because they use a custom image that requires it - I am not sure 🤷‍♂️.

testcontainers / testcontainers-dotnet