testcontainers / testcontainers-go

Testcontainers for Go is a Go package that makes it simple to create and clean up container-based dependencies for automated integration/smoke tests. The clean, easy-to-use API enables developers to programmatically define containers that should be run as part of a test and clean up those resources when the test is done.
https://golang.testcontainers.org
MIT License
3.54k stars 489 forks source link

[Bug]: Fail to start Kafka container (KRaft) with basic setup #2543

Closed ronshemerws closed 2 months ago

ronshemerws commented 4 months ago

Testcontainers version

0.31.0

Using the latest Testcontainers version?

Yes

Host OS

Mac

Host arch

ARM

Go version

1.22.3

Docker version

Client: Docker Engine - Community
 Version:           24.0.6
 API version:       1.43
 Go version:        go1.21.0
 Git commit:        ed223bc820
 Built:             Thu Aug 31 17:24:32 2023
 OS/Arch:           darwin/arm64
 Context:           desktop-linux

Server: Docker Desktop 4.30.0 (149282)
 Engine:
  Version:          26.1.1
  API version:      1.45 (minimum version 1.24)
  Go version:       go1.21.9
  Git commit:       ac2de55
  Built:            Tue Apr 30 11:48:04 2024
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.6.31
  GitCommit:        e377cd56a71523140ca6ae87e30244719194a521
 runc:
  Version:          1.1.12
  GitCommit:        v1.1.12-0-g51d5e94
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Docker info

Client: Docker Engine - Community
 Version:    24.0.6
 Context:    desktop-linux
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.14.0-desktop.1
    Path:     /Users/ronshemer/.docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.27.0-desktop.2
    Path:     /Users/ronshemer/.docker/cli-plugins/docker-compose
  debug: Get a shell into any image or container (Docker Inc.)
    Version:  0.0.29
    Path:     /Users/ronshemer/.docker/cli-plugins/docker-debug
  dev: Docker Dev Environments (Docker Inc.)
    Version:  v0.1.2
    Path:     /Users/ronshemer/.docker/cli-plugins/docker-dev
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.23
    Path:     /Users/ronshemer/.docker/cli-plugins/docker-extension
  feedback: Provide feedback, right in your terminal! (Docker Inc.)
    Version:  v1.0.4
    Path:     /Users/ronshemer/.docker/cli-plugins/docker-feedback
  init: Creates Docker-related starter files for your project (Docker Inc.)
    Version:  v1.1.0
    Path:     /Users/ronshemer/.docker/cli-plugins/docker-init
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc.)
    Version:  0.6.0
    Path:     /Users/ronshemer/.docker/cli-plugins/docker-sbom
  scout: Docker Scout (Docker Inc.)
    Version:  v1.8.0
    Path:     /Users/ronshemer/.docker/cli-plugins/docker-scout

Server:
 Containers: 6
  Running: 1
  Paused: 0
  Stopped: 5
 Images: 38
 Server Version: 26.1.1
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: e377cd56a71523140ca6ae87e30244719194a521
 runc version: v1.1.12-0-g51d5e94
 init version: de40ad0
 Security Options:
  seccomp
   Profile: unconfined
  cgroupns
 Kernel Version: 6.6.26-linuxkit
 Operating System: Docker Desktop
 OSType: linux
 Architecture: aarch64
 CPUs: 10
 Total Memory: 15.6GiB
 Name: docker-desktop
 ID: 4c6eebbc-c190-4257-96b7-65293c5d4add
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http.docker.internal:3128
 HTTPS Proxy: http.docker.internal:3128
 No Proxy: hubproxy.docker.internal
 Labels:
  com.docker.desktop.address=unix:///Users/ronshemer/Library/Containers/com.docker.docker/Data/docker-cli.sock
 Experimental: false
 Insecure Registries:
  hubproxy.docker.internal:5555
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: daemon is not using the default seccomp profile

What happened?

I followed the documentation on how to use kafka testcontainers and at first it worked randomly (got the container running), and now it does not work constantly ending up with the "port not found" error after waiting the 1m timeout.

Relevant log output

⚑ go run .
2024/05/19 16:40:21 github.com/testcontainers/testcontainers-go - Connected to docker:
  Server Version: 26.1.1
  API Version: 1.44
  Operating System: Docker Desktop
  Total Memory: 15976 MB
  Resolved Docker Host: unix:///var/run/docker.sock
  Resolved Docker Socket Path: /var/run/docker.sock
  Test SessionID: 46c175eaf04deba9d5b7bdc51177a827b389a4d7cec93f329a3bef4e700b2154
  Test ProcessID: c8a51e52-8aa0-4514-aa6d-d487aff9d1be
2024/05/19 16:40:21 🐳 Creating container for image testcontainers/ryuk:0.7.0
2024/05/19 16:40:21 βœ… Container created: cad4a0e8592a
2024/05/19 16:40:21 🐳 Starting container: cad4a0e8592a
2024/05/19 16:40:21 βœ… Container started: cad4a0e8592a
2024/05/19 16:40:21 🚧 Waiting for container id cad4a0e8592a image: testcontainers/ryuk:0.7.0. Waiting for: &{Port:8080/tcp timeout:<nil> PollInterval:100ms}
2024/05/19 16:40:22 πŸ”” Container is ready: cad4a0e8592a
2024/05/19 16:40:22 🐳 Creating container for image confluentinc/confluent-local:7.5.0
2024/05/19 16:40:22 βœ… Container created: e1bc25805a7b
2024/05/19 16:40:22 🐳 Starting container: e1bc25805a7b
2024/05/19 16:41:22 βœ… Container started: e1bc25805a7b
2024/05/19 16:41:22 container logs (port not found
context deadline exceeded):

2024/05/19 16:41:22 failed to start container: failed to start container: port not found
context deadline exceeded
exit status 1

Additional information

This is the complete program I am running to reproduce (started an empty project):

package main

import (
    "context"
    "log"

    "github.com/testcontainers/testcontainers-go/modules/kafka"
)

func main() {
    ctx := context.Background()

    kafkaContainer, err := kafka.RunContainer(ctx,
        kafka.WithClusterID("test-cluster"),
    )
    if err != nil {
        log.Fatalf("failed to start container: %s", err)
    }

    // Clean up the container after
    defer func() {
        if err := kafkaContainer.Terminate(ctx); err != nil {
            log.Fatalf("failed to terminate container: %s", err)
        }
    }()
}

Here is the mod file

module kafka-test

go 1.22.3

require (
    dario.cat/mergo v1.0.0 // indirect
    github.com/Azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1 // indirect
    github.com/Microsoft/go-winio v0.6.1 // indirect
    github.com/Microsoft/hcsshim v0.11.4 // indirect
    github.com/cenkalti/backoff/v4 v4.2.1 // indirect
    github.com/containerd/containerd v1.7.15 // indirect
    github.com/containerd/log v0.1.0 // indirect
    github.com/cpuguy83/dockercfg v0.3.1 // indirect
    github.com/distribution/reference v0.5.0 // indirect
    github.com/docker/docker v25.0.5+incompatible // indirect
    github.com/docker/go-connections v0.5.0 // indirect
    github.com/docker/go-units v0.5.0 // indirect
    github.com/felixge/httpsnoop v1.0.4 // indirect
    github.com/go-logr/logr v1.4.1 // indirect
    github.com/go-logr/stdr v1.2.2 // indirect
    github.com/go-ole/go-ole v1.2.6 // indirect
    github.com/gogo/protobuf v1.3.2 // indirect
    github.com/golang/protobuf v1.5.4 // indirect
    github.com/google/uuid v1.6.0 // indirect
    github.com/klauspost/compress v1.16.7 // indirect
    github.com/lufia/plan9stats v0.0.0-20211012122336-39d0f177ccd0 // indirect
    github.com/magiconair/properties v1.8.7 // indirect
    github.com/moby/patternmatcher v0.6.0 // indirect
    github.com/moby/sys/sequential v0.5.0 // indirect
    github.com/moby/sys/user v0.1.0 // indirect
    github.com/moby/term v0.5.0 // indirect
    github.com/morikuni/aec v1.0.0 // indirect
    github.com/opencontainers/go-digest v1.0.0 // indirect
    github.com/opencontainers/image-spec v1.1.0 // indirect
    github.com/pkg/errors v0.9.1 // indirect
    github.com/power-devops/perfstat v0.0.0-20210106213030-5aafc221ea8c // indirect
    github.com/shirou/gopsutil/v3 v3.23.12 // indirect
    github.com/shoenig/go-m1cpu v0.1.6 // indirect
    github.com/sirupsen/logrus v1.9.3 // indirect
    github.com/testcontainers/testcontainers-go v0.31.0 // indirect
    github.com/testcontainers/testcontainers-go/modules/kafka v0.31.0 // indirect
    github.com/tklauser/go-sysconf v0.3.12 // indirect
    github.com/tklauser/numcpus v0.6.1 // indirect
    github.com/yusufpapurcu/wmi v1.2.3 // indirect
    go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.49.0 // indirect
    go.opentelemetry.io/otel v1.24.0 // indirect
    go.opentelemetry.io/otel/metric v1.24.0 // indirect
    go.opentelemetry.io/otel/trace v1.24.0 // indirect
    golang.org/x/crypto v0.22.0 // indirect
    golang.org/x/mod v0.16.0 // indirect
    golang.org/x/sys v0.19.0 // indirect
    golang.org/x/tools v0.13.0 // indirect
    google.golang.org/genproto/googleapis/rpc v0.0.0-20230731190214-cbb8c96f2d6d // indirect
    google.golang.org/grpc v1.58.3 // indirect
    google.golang.org/protobuf v1.33.0 // indirect
)
wilsouza commented 4 months ago

I am also experiencing this issue and have been able to reproduce it. It appears to be related to the PostStartLifecycleHook. The Kafka module uses the PostStart LifecycleHook to get the mapped port, which is used to create the Kafka startup script. However, when PostStart is called, the Docker container has not yet resolved the port mapping, causing the port not found error.

While I am not certain if there is a straightforward solution, I have found a less elegant solution to resolve this issue. Simply adding a sleep at the beginning of the Kafka PostStart hook works fine in my tests, and we can make it configurable as needed.

I will suggest a PR for review.

@mdelapenya, do you know of a way to wait until Docker resolves the port mapping before the PostStart hook?

mdonkers commented 4 months ago

@wilsouza , with adding a delay, did the problem go away completely for you?

My colleague @grevend is also running into this issue, on a Mac (for Linux I cannot reproduce it). And we both tried adding delays, but also retries, and neither worked consistently.

So we are now wondering if its really the same issue, or slightly different. If we look into the response from the MappedPort request it's sometimes correct, but other times not having any data in the NetworkSettings.Ports section of the Docker inspect command.

Note we're still debugging further, seeing if we can find something to make it work more consistently, but so far it either works, or not (delays etc having no positive effect).

thmshmm commented 4 months ago

I experience the same behaviour with the Redpanda module. Also tried older versions of testcontainers with the same result. Recently updated to the latest Docker Desktop.

wilsouza commented 4 months ago

Thank you for sharing the details of the problems you're encountering.

@wilsouza , with adding a delay, did the problem go away completely for you?

Yes, adding a delay has completely resolved the issue for me. I'm running an internal version with this solution and haven't experienced any problems. I’ve also tested it on my Mac with an M1 processor without issues.

My colleague @grevend is also running into this issue, on a Mac (for Linux I cannot reproduce it). And we both tried adding delays, but also retries, and neither worked consistently.

So we are now wondering if its really the same issue, or slightly different. If we look into the response from the MappedPort request it's sometimes correct, but other times not having any data in the NetworkSettings.Ports section of the Docker inspect command.

Note we're still debugging further, seeing if we can find something to make it work more consistently, but so far it either works, or not (delays etc having no positive effect).

Have you tried running version #2552 with a custom wait time? Could you try running it with a custom timeout to see if it resolves the issue on your end?

mdonkers commented 4 months ago

@wilsouza we tried with waits, unfortunately though, the wait needs to be pretty long to be stable (3-5 seconds). What makes that unacceptable from my perspective, is that the wait isn't needed on Linux based systems. Even if for a second, it would add a second to every test run, for every time the Kafka container needs to be started.

We also further looked into why some retry mechanism doesn't work and found the cause. MappedPort calls the Inspect function, which always caches the first result it got: https://github.com/testcontainers/testcontainers-go/blob/d4a21ea92ee84c058c3ad189aa328b9f5229807e/docker.go#L182 https://github.com/testcontainers/testcontainers-go/blob/d4a21ea92ee84c058c3ad189aa328b9f5229807e/docker.go#L169

Which is then the reason why a retry doesn't work (if it once gets a wrong response, it always gets the wrong response), and a initial delay somehow works as it gives Docker a chance to set up the port mapping.

I'm happy to provide assistance to any solution, but would really like to hear @mdelapenya 's perspective. Because I'm wondering if caching that Inspect response is the right thing to do. As I guess most fields would be stable, but next to the network there might be some other fields that change such as the State ones? Always caching would give a high potential of returning wrong info.

And of course the question remains why the port mapping has some delay in the first place for MacOS, but maybe that's due to the nature of how Docker works there.

Eyal-Shalev commented 3 months ago

And of course the question remains why the port mapping has some delay in the first place for MacOS, but maybe that's due to the nature of how Docker works there.

Maybe it relates to the fact MacOS uses a Virtual Machine to host docker containers πŸ€”

Maybe the port is mapped in the VM, but not in the MacOS host machine.

mgilbir commented 3 months ago

For the record, I have this behaviour also on a linux machine running ubuntu:

  Server Version: 26.1.4
  API Version: 1.44
  Operating System: Ubuntu 22.04.4 LTS
  Total Memory: 7771 MB
  Resolved Docker Host: unix:///var/run/docker.sock
  Resolved Docker Socket Path: /var/run/docker.sock
  Test SessionID: 109ffb6f0068199f8347d3954fd3f1fb31f80a5ffc016cd4ea231e7a3193510d
  Test ProcessID: 5dcbcb80-8058-4b5a-a12b-37d4d47fd291
  [...]
  kafka_test.go:13: 
            Error Trace:    [...]/kafka_test.go:13
            Error:          Received unexpected error:
                            failed to start container: context deadline exceeded
kishansairam9 commented 3 months ago

any update on this?

dramosOptiply commented 2 months ago

Got hit with this bug today on my Mac. Happens randomly. I'm using the latest testcontainers version.

wilsouza commented 2 months ago

I think that this issue can be closed now that it has been addressed in @mdelapenya PR, #2606. What do you think @mdelapenya? I’m unable to validate if it’s still occurring at this time.

ronshemerws commented 2 months ago

This issue seem to be resolved for me when upgraded to v0.32.0. I will close this now, thanks to everyone that was involved, this is much appreciated!

mdelapenya commented 2 months ago

thanks folks for your support! I'm currently at Gophercon, and I will go back to regular work next Monday πŸ™