testcontainers / testcontainers-java

Testcontainers is a Java library that supports JUnit tests, providing lightweight, throwaway instances of common databases, Selenium web browsers, or anything else that can run in a Docker container.
https://testcontainers.org
MIT License
7.97k stars 1.64k forks source link

[Bug]: Deadlock between DockerClientFactory and RyukResourceReaper with JUnit 5 parallel tests #9120

Open pkwarren opened 4 weeks ago

pkwarren commented 4 weeks ago

Module

Core

Testcontainers version

1.20.1

Using the latest Testcontainers version?

Yes

Host OS

MacOS

Host Arch

arm64

Docker version

Client:
 Version:           27.1.1
 API version:       1.46
 Go version:        go1.21.12
 Git commit:        6312585
 Built:             Tue Jul 23 19:54:12 2024
 OS/Arch:           darwin/arm64
 Context:           desktop-linux

Server: Docker Desktop 4.33.0 (160616)
 Engine:
  Version:          27.1.1
  API version:      1.46 (minimum version 1.24)
  Go version:       go1.21.12
  Git commit:       cc13f95
  Built:            Tue Jul 23 19:57:14 2024
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.7.19
  GitCommit:        2bf793ef6dc9a18e00cb12efb64355c2c9d5eb41
 runc:
  Version:          1.7.19
  GitCommit:        v1.1.13-0-g58aa920
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

What happened?

I'm attempting to run tests in parallel with JUnit 5. One test spins up a static ComposeContainer with .withLocalCompose(true) and another spins up a static KafkaContainer. This leads to a deadlock on startup, where one thread acquires the lock on RyukResourceReaper and then fails to acquire the lock in DockerClientFactory, while the other thread does the opposite.

Relevant log output

Kafka container thread:

"testcontainers-lifecycle-0" #35 [41731] daemon prio=5 os_prio=31 cpu=207.18ms elapsed=23.45s tid=0x000000012225ba00 nid=41731 waiting for monitor entry  [0x00000001735fa000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at org.testcontainers.utility.RyukResourceReaper.maybeStart(RyukResourceReaper.java:74)
    - waiting to lock <0x000000060201f118> (a org.testcontainers.utility.RyukResourceReaper)
    at org.testcontainers.utility.RyukResourceReaper.init(RyukResourceReaper.java:42)
    at org.testcontainers.DockerClientFactory.client(DockerClientFactory.java:232)
    - locked <0x000000060201ee60> (a [Ljava.lang.Object;)
    at org.testcontainers.DockerClientFactory$1.getDockerClient(DockerClientFactory.java:106)
    at com.github.dockerjava.api.DockerClientDelegate.authConfig(DockerClientDelegate.java:109)
    at org.testcontainers.containers.GenericContainer.start(GenericContainer.java:329)

Compose container thread:

"testcontainers-lifecycle-1" #37 [37891] daemon prio=5 os_prio=31 cpu=3.14ms elapsed=23.41s tid=0x0000000122254600 nid=37891 waiting for monitor entry  [0x0000000173a12000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at org.testcontainers.DockerClientFactory.client(DockerClientFactory.java:185)
    - waiting to lock <0x000000060201ee60> (a [Ljava.lang.Object;)
    at org.testcontainers.DockerClientFactory$1.getDockerClient(DockerClientFactory.java:106)
    at com.github.dockerjava.api.DockerClientDelegate.authConfig(DockerClientDelegate.java:109)
    at org.testcontainers.containers.GenericContainer.start(GenericContainer.java:329)
    at org.testcontainers.utility.RyukResourceReaper.maybeStart(RyukResourceReaper.java:78)
    - locked <0x000000060201f118> (a org.testcontainers.utility.RyukResourceReaper)
    at org.testcontainers.utility.RyukResourceReaper.registerLabelsFilterForCleanup(RyukResourceReaper.java:51)
    at org.testcontainers.containers.ComposeDelegate.registerContainersForShutdown(ComposeDelegate.java:247)
    at org.testcontainers.containers.ComposeContainer.start(ComposeContainer.java:125)
    - locked <0x000000060201f2a8> (a java.lang.Object)

Additional Information

No response

eddumelendez commented 3 weeks ago

Hi @pkwarren, can you please provide a project that reproduces the issue?

pkwarren commented 3 weeks ago

Here's an example repo showing the problem: https://github.com/pkwarren/testcontainers-issue-9120

eddumelendez commented 3 weeks ago

Thanks for sharing @pkwarren. I did some changes because the docker-compose.yml file was not found and also had to set version but can not reproduce the issue. Do you mind taking a look?

pkwarren commented 3 weeks ago

the docker-compose.yml file was not found

It should be here: https://github.com/pkwarren/testcontainers-issue-9120/blob/main/docker-compose.yml

also had to set version

I don't follow - where did a version need to be specified?

can not reproduce the issue. Do you mind taking a look?

If you could provide more specifics on what you're doing and any errors you're seeing I'd be happy to update the example project. For me just running ./mvnw clean verify hangs - if you use jstack to look at the PID of the launched Maven surefire process you can see the deadlock.

eddumelendez commented 3 weeks ago

the docker-compose.yml file was not found

I had to change from ComposeContainer("docker-compose.yml") to ComposeContainer(new File ("docker-compose.yml"))

I don't follow - where did a version need to be specified?

I was talking about version in docker-compose.yml file. But executing again, I don't need it anymore.

I jus wanted to make sure we have the same code to reproduce. After that, just ran ./mvnw clean verify and everything executed successfully. I am also running on Mac M1 Pro.

pkwarren commented 3 weeks ago

Pushed updates to fix the ComposeContainer constructor usage and switched the container in docker-compose.yml to be Kafka (in case we're running into a race condition and starting up nginx is too fast to repro the problem). Hopefully this will allow you to see the same behavior I'm seeing.

I'm on the latest version of Docker desktop (v4.33.0) if it matters.