testcontainers / testcontainers-java

Testcontainers is a Java library that supports JUnit tests, providing lightweight, throwaway instances of common databases, Selenium web browsers, or anything else that can run in a Docker container.
https://testcontainers.org
MIT License
7.88k stars 1.62k forks source link

[Bug]: cannot create network ... : conflicts with network ... : networks have overlapping IPv4 #8813

Closed FRosner closed 1 week ago

FRosner commented 1 week ago

Module

Core

Testcontainers version

1.19.1

Using the latest Testcontainers version?

No

Host OS

Linux

Host Arch

x86, ARM

Docker version

Client: Docker Engine - Community
 Version:           27.0.2
 API version:       1.46
 Go version:        go1.21.11
 Git commit:        912c1dd
 Built:             Wed Jun 26 18:47:28 2024
 OS/Arch:           linux/amd64
 Context:           default

What happened?

All of a sudden many of our GitHub action workflows started failing, because our tests couldn't start the containers. The error we see in the logs is:

Caused by: com.github.dockerjava.api.exception.DockerException: Status 403: ***"message":"cannot create network bda9d3d33ed5fdb18b1a5ffc80496e325e7e454f9292f05026cf8796b9aba395 (br-bda9d3d33ed5): conflicts with network bf05f12899f2e8b3b8835115218e4fa5074476c9d53676a31b7ace186d368b2a (br-bf05f12899f2): networks have overlapping IPv4"***

Relevant log output

No response

Additional Information

This has started only after Ubuntu published the new docker version 27 to its repository and our runners upgraded automatically, because we didn't pin the version. I see in the release notes that there have been some changes to networking (and some networking APIs), so I'm wondering if this introduced some race condition.

In our tests, we create many networks concurrently using Network.newNetwork(), and that worked fine so far, as the docker daemon should be assigning subnets in an incrementing fashion.

I was wondering if this is maybe a regression in docker itself, and it's possible, but I wasn't able to reproduce locally with `for n in $(seq 0 256); do docker network create "n$n"& done.

kiview commented 1 week ago

Thanks for reporting @FRosner, we'll check in with the colleagues maintaining Docker, whether something might have changed in this regard.

It is also surprising, that our GHA workflows don't fail in this case (I would assume they also updated, but @eddumelendez can confirm).

FRosner commented 1 week ago

@kiview nice to see you again :P

It's not failing for single tests. It seems to be a race condition as we create hundreds of networks concurrently in our tests across multiple JVMs.

dhoard commented 1 week ago

As a data point, I see the same issue using the latest Testcontainers version.

Module

Core

Testcontainers version

1.19.8

Using the latest Testcontainers version?

Yes

Host OS

Ubuntu 24.04

Host Arch

x86

Docker version

Client: Docker Engine - Community
 Version:           27.0.2
 API version:       1.46
 Go version:        go1.21.11
 Git commit:        912c1dd
 Built:             Wed Jun 26 18:47:25 2024
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          27.0.2
  API version:      1.46 (minimum version 1.24)
  Go version:       go1.21.11
  Git commit:       e953d76
  Built:            Wed Jun 26 18:47:25 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.7.18
  GitCommit:        ae71819c4f5e67bb4d5ae76a6b735f29cc25774e
 runc:
  Version:          1.7.18
  GitCommit:        v1.1.13-0-g58aa920
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
robmry commented 1 week ago

Hi all - I think it's this Docker Engine regression ... https://github.com/moby/moby/issues/48069#issuecomment-2195563789

I was wondering if this is maybe a regression in docker itself, and it's possible, but I wasn't able to reproduce locally with `for n in $(seq 0 256); do docker network create "n$n"& done.

It's not failing for single tests. It seems to be a race condition as we create hundreds of networks concurrently in our tests across multiple JVMs.

I don't think it'll be reproducible with just-allocations, it'll happen with a mix of network creation and deletion. Not a timing race, it's caused by an ordered list getting out of order.

I'm working on a fix - but if this description doesn't fit what you're seeing, please let me know.

FRosner commented 1 week ago

Thanks! So if there's nothing testcontainers can do, let's close this issue?

eddumelendez commented 1 week ago

Closing this issue. The fix should land in the next docker versions and let's consider that we should also wait once GH runners has been updated.

thaJeztah commented 6 days ago