testcontainers / testcontainers-go

Testcontainers for Go is a Go package that makes it simple to create and clean up container-based dependencies for automated integration/smoke tests. The clean, easy-to-use API enables developers to programmatically define containers that should be run as part of a test and clean up those resources when the test is done.
https://golang.testcontainers.org
MIT License
3.36k stars 460 forks source link

[Bug]: Endpoint() returns "port not found" error, even though wait.ForListeningPort is used #605

Open awagner-iq opened 1 year ago

awagner-iq commented 1 year ago

Testcontainers version

0.15.0

Using the latest Testcontainers version?

Yes

Host OS

Linux

Host Arch

x86_64

Go Version

1.19

Docker version

mero@dibbler ~$ podman version
Version:      3.4.7
API Version:  3.4.7
Go Version:   go1.16.15
Built:        Thu Apr 21 19:38:09 2022
OS/Arch:      linux/amd64

Docker info

mero@dibbler ~$ podman info   
host:
  arch: amd64
  buildahVersion: 1.23.1
  cgroupControllers:
  - cpu
  - io
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.0.32-1.fc34.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.32, commit: '
  cpus: 12
  distribution:
    distribution: fedora
    variant: workstation
    version: "34"
  eventLogger: journald
  hostname: dibbler
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.17.12-100.fc34.x86_64
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 1298100224
  memTotal: 16543584256
  ociRuntime:
    name: crun
    package: crun-1.4.4-1.fc34.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.4.4
      commit: 6521fcc5806f20f6187eb933f9f45130c86da230
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.12-2.fc34.x86_64
    version: |-
      slirp4netns version 1.1.12
      commit: 7a104a101aa3278a2152351a082a6df71f57c9a3
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.3
  swapFree: 16906444800
  swapTotal: 16936591360
  uptime: 221h 45m 47.69s (Approximately 9.21 days)
plugins:
  log:
  - k8s-file
  - none
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /home/mero/.config/containers/storage.conf
  containerStore:
    number: 42
    paused: 0
    running: 0
    stopped: 42
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/mero/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 4
  runRoot: /run/user/1000/containers
  volumePath: /home/mero/.local/share/containers/storage/volumes
version:
  APIVersion: 3.4.7
  Built: 1650562689
  BuiltTime: Thu Apr 21 19:38:09 2022
  GitCommit: ""
  GoVersion: go1.16.15
  OsArch: linux/amd64
  Version: 3.4.7

What happened?

We are seeing test flakes with the error message mysql.Endpoint() = port not found. The code we use to create the container is attached below. In particular, we use both wait.ForListeningPort("3306/tcp") and wait.ForLog("port: 3306") to make sure we wait until the container is running and available. However, it seems even with that, GenericContainer will occasionally return without being ready, as evidenced by the fact that Endpoint returns an error about the port not being found.

Relevant log output

No response

Additional Information

    mysql, err := testcontainers.GenericContainer(ctx, testcontainers.GenericContainerRequest{
        ContainerRequest: testcontainers.ContainerRequest{
            Image:        "mysql:8",
            ExposedPorts: []string{"3306/tcp"},
            Env: map[string]string{
                "MYSQL_ROOT_PASSWORD": "test",
            },
            WaitingFor: wait.ForAll(wait.ForListeningPort("3306/tcp"), wait.ForLog("port: 3306")),
            SkipReaper: true, // https://github.com/testcontainers/testcontainers-go/issues/604
        },
        Started: true,
        Logger:  testcontainers.TestLogger(t),
    })
    if err != nil {
        t.Fatalf("testcontainers.GenericContainer() = %v", err)
    }
    t.Cleanup(func() {
        t.Logf("Stopping container")
        d := new(time.Duration)
                // TODO: What happens when the context expires? Can we pass a different
                // context here, or would that prevent Stop from working?
        mysql.Stop(context.Background(), d)
    })

    ep, err := mysql.Endpoint(ctx, "")
    if err != nil {
        t.Fatalf("mysql.Endpoint() = %v", err)
    }
anuraaga commented 1 year ago

I am seeing this as well on 0.17.0 with mysql image, test sometimes fails, more often on CI. My configuration is the same as presented except I am using normal Docker and don't skip reaper. time.Sleep(5 * time.Second) didn't add stability but time.Sleep(30 * time.Second) does seem to have helped - presumably this is the time of the relatively slow MySQL container itself and the wait condition is getting missed for some reason.

For context, the build starts and terminates a database container about three times, perhaps having multiple start/stop in the same binary run (in this case go test ./...) causes a race condition?

mdelapenya commented 1 year ago

@anuraaga @awagner-iq thanks for open this issue, and sorry for the radio silence, I went on paternity leave at the beginning of December and probably skipped this ticket.

I'm investigating why it fails for your use case although, with the above snippet, I'm not able to reproduce it yet. Debugging...

michielboekhoff commented 1 year ago

I'm noticing this as well while running on Docker 20.10 on Linux - it's resulting in a fair bit of flakiness, especially in our CI pipeline. It doesn't look like this is strictly related to Podman.

priegger commented 1 year ago

It seems to be a bug in the Endpoint function. I'm using docker on linux and I can reproduce this easily on my machine and in ci. Now I create the URL myself for a specific port (not the first one, as Endpoint does it) and I did not see this issue yet.

mdelapenya commented 1 year ago

It seems to be a bug in the Endpoint function. I'm using docker on linux and I can reproduce this easily on my machine and in ci. Now I create the URL myself for a specific port (not the first one, as Endpoint does it) and I did not see this issue yet.

Could you please add a repro snippet, including which version of the library are you using? 🙏 If there are so many users seeing it, we could be in front of a real bug that we should fix 🐞

bokimilinkovic commented 1 year ago

Yes would be great if this one can go up for priority because it is still occurring , and it is breaking a lot of CI pipelines. This is code snippet:

github.com/testcontainers/testcontainers-go v0.19.0

req := testcontainers.ContainerRequest{
        Image:        "nats:latest",
        Name:         natsContainerName,
        ExposedPorts: []string{"4222/tcp"},
        Networks:     networks,
        WaitingFor:   wait.ForAll(wait.ForLog("Server is ready")),
    }

    container, err := testcontainers.GenericContainer(ctx, testcontainers.GenericContainerRequest{
        ContainerRequest: req,
        Started:          true,
    })
    if err != nil {
        return nil, err
    }
        natsMappedPort, err := container.MappedPort(ctx, "4222/tcp")
    if err != nil {
        log.Errorf("Failed to get nats port: %v", err)
        return initFailure
    }
    natsHost, err := container.Host(ctx)
    if err != nil {
        log.Errorf("Failed to get nats host: %v", err)

        return initFailure
    }

    natsEndpoint, err := container.Endpoint(ctx, "") 
    if err != nil {
        log.Errorf("Failed to get nats endpoint: %v", err) --- error occurs in here
        return initFailure
    }
priegger commented 1 year ago

Could you please add a repro snippet, including which version of the library are you using? pray If there are so many users seeing it, we could be in front of a real bug that we should fix lady_beetle

Hi, I did not have the time yet to create a repro snippet. The failing code I have is something I cannot share.

I switched all my Endpoint(...) calls to PortEndpoint(...) and explicitly specified the port. I did not have any issues since then. So I think the reason is that the first port that gets detected somehow does not work.

What I'd do given some time:

I used a GenericContainer with mongo:4.0, and testcontainers 0.19.0, but also switched to the latest version for a quick test and that did not fix the issue.

Maybe this already helps and gives someone the opportunity to reproduce and analyze this bug.

mdelapenya commented 1 year ago

I'm able to reproduce this exact error in a VM using podman. Will take a look after my parental leave

mdelapenya commented 1 year ago

In the mean time, checking the state of the Docker types, I saw this difference when inspecting a container with podman Vs docker:

Steps to reproduce

  1. Edit docker.go, Ports method, to print out the docker types for the inspected container:
    func (c *DockerContainer) Ports(ctx context.Context) (nat.PortMap, error) {
    inspect, err := c.inspectContainer(ctx)
    if err != nil {
        return nil, err
    }
    +   fmt.Printf(">>> network.settings.ports: %+v\n", inspect.NetworkSettings.Ports)
    +   fmt.Printf(">>> config.exposed.ports: %+v\n", inspect.Config.ExposedPorts)
    return inspect.NetworkSettings.Ports, nil
    }
  2. Run the following test:
    go run gotest.tools/gotestsum --format short-verbose --packages="./..." -- -run "^TestContainerWithHostNetworkOptions_UseExposePortsFromImageConfigs" -timeout 600s  -count=1 -v

With Podman (Ubuntu VM)

>>> network.settings.ports: map[80/tcp:[]]
>>> config.exposed.ports: map[80/tcp:{}]

With Docker (on Mac)

>>> network.settings.ports: map[80/tcp:[{HostIP:0.0.0.0 HostPort:57087}]]
>>> config.exposed.ports: map[80/tcp:{}]

So for some reason, the docker types in Podman are inconsistent according Docker 🤷 . Pinging @kiview @cristianrgreco @eddumelendez @HofmeisterAn for awareness while I'm out

mdelapenya commented 1 year ago

Could be related to https://github.com/containers/podman/issues/17780 🤔

tylfin commented 3 months ago

I'm seeing an intermittent port not found error using Docker for Mac, I tried upgrading and am currently on Docker version 26.0.0, build 2ae903e. I'm exposing it as a CustomizeRequest, e.g.:

func WithPort(port int) testcontainers.CustomizeRequestOption {
    return func(req *testcontainers.GenericContainerRequest) {
        ...
        req.ExposedPorts = append(req.ExposedPorts, fmt.Sprintf("%d/tcp", port))
        }
}

and a wait strategy that passes:

wait.ForHTTP(endpoint).WithPort(containerPortWithProtocol).WithStartupTimeout(timeout),

If folks have any workarounds or tips that would be helpful.

EDIT: After investigating this issue further, it looks like docker inspect returns results excluding the exposed port from the network settings, it might be related to https://github.com/moby/moby/issues/42860. A possible workaround would be to retry MappedPort in Docker.go a few times internally 🤷‍♂️

vladcar commented 2 months ago

Retrying MappedPort didn't work for me so I ended up calling docker inspect from the code. Ugly workaround but it works:

func getMappedPorts(containerID string) (nat.PortMap, error) {
    cmd := exec.Command("docker", "inspect", "--format", "{{json .NetworkSettings.Ports}}", containerID)

    output, err := cmd.CombinedOutput()
    if err != nil {
        return nil, fmt.Errorf("failed to execute docker inspect: %s, %v", output, err)
    }

    var ports nat.PortMap
    if err = json.Unmarshal(output, &ports); err != nil {
        return nil, fmt.Errorf("failed to parse output: %v", err)
    }

    return ports, nil
}
mhornbacher commented 2 months ago

This issue started occurring for me only after installing minikube on the local docker installation. However since then it is consistently failing.

However adding a manual delay and watching that the port is open seems to resolve this. I assume there is a timing issue here...

This occurs on Docker as well as podman

rajaie7 commented 1 month ago

Also seeing this error with the postgres container

        postgresContainer, err := postgres.RunContainer(ctx, testcontainers.WithImage("postgres:14-alpine"),
            postgres.WithUsername("username"), postgres.WithPassword("password"),
            testcontainers.WithLogConsumers(logConsumer))

        mappedPort, err := postgresContainer.MappedPort(ctx, nat.Port("5432/tcp"))

adding

        time.Sleep(10 * time.Second)

helps resolve the issue.

This also seems to have helped

testcontainers.WithWaitStrategy(
                wait.ForLog("database system is ready to accept connections").
                    WithOccurrence(2).
                    WithStartupTimeout(5*time.Second))