testcontainers / testcontainers-java

Testcontainers is a Java library that supports JUnit tests, providing lightweight, throwaway instances of common databases, Selenium web browsers, or anything else that can run in a Docker container.
https://testcontainers.org
MIT License
8k stars 1.65k forks source link

[Bug]: Race condition between DB init and test executions from JdbcDatabaseContainer based containers #8555

Open luanhui0420 opened 5 months ago

luanhui0420 commented 5 months ago

Module

Core

Testcontainers version

1.19.1

Using the latest Testcontainers version?

No

Host OS

Linux

Host Arch

ARM

Docker version

Client:
 Cloud integration: v1.0.35+desktop.11
 Version:           25.0.3
 API version:       1.44
 Go version:        go1.21.6
 Git commit:        4debf41
 Built:             Tue Feb  6 21:13:26 2024
 OS/Arch:           darwin/arm64
 Context:           desktop-linux

Server: Docker Desktop 4.28.0 (139021)
 Engine:
  Version:          25.0.3
  API version:      1.44 (minimum version 1.24)
  Go version:       go1.21.6
  Git commit:       f417435
  Built:            Tue Feb  6 21:14:22 2024
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.6.28
  GitCommit:        ae07eda36dd25f8a1b98dfbf587313b99c0190bb
 runc:
  Version:          1.1.12
  GitCommit:        v1.1.12-0-g51d5e94
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

What happened?

[TL;DR] The unit tests starts to run and fail without waiting for the completion of DB Init running on DB Containers.

Problem We use CockroachContainer to launch the container and uses init.sql copied to container to run DB schema migration and test data filling

The unit tests starts to fail when there are more SQLs in init.sql and the execution time is longer. The particular unit test failure is becoz of the uncompleted SQLs, like lack of schema or lack of test data.

Root cause investigation I tested the same tests and init.sql in generic container using LogMessage waiting strategy, and it works well.

I then looked back to the new CockroachContainer and its parent class JdbcDatabaseContainer. It was found out that the JdbcDatabaseContainer overrides the waitUntilContainerStarted and writes its own DB connection based logic and ignore the waitingStrategy passed in either via constructor or by .waitingFor. The custom logic only tests the "select 1", that leads to the early ready of the database container and caused the unit test failures as race condition.

Suggestion It would make more sense to still utilize WaitingStrategy to wrap the Query based ready check, or make it very explicitly the waitingFor is not gonna work any more. It can be confusing without notice how the waiting logic is totally override.

Also with the query based ready check, it has no way to customize the query by callers, its designated to be "select 1" for example in CockroachContainer which seems too limited.

Relevant log output

No response

Additional Information

No response

kiview commented 5 months ago

Thanks for reporting and the detailed analysis @luanhui0420.

It was found out that the JdbcDatabaseContainer overrides the waitUntilContainerStarted and writes its own DB connection based logic and ignore the waitingStrategy passed in either via constructor or by .waitingFor.

This is indeed the case for most subclasses of JdbcDatabaseContainer. And I imagine, depending on the internal image lifecycle, the same issue of wait behavior being independent of init script execution (hence introducing a race condition), could be present for other database container implementations as well.

luanhui0420 commented 5 months ago

Thanks for reporting and the detailed analysis @luanhui0420.

It was found out that the JdbcDatabaseContainer overrides the waitUntilContainerStarted and writes its own DB connection based logic and ignore the waitingStrategy passed in either via constructor or by .waitingFor.

This is indeed the case for most subclasses of JdbcDatabaseContainer. And I imagine, depending on the internal image lifecycle, the same issue of wait behavior being independent of init script execution (hence introducing a race condition), could be present for other database container implementations as well.

I agree it impacts all the subclasses. It is usually not noticeable if the init script is short as well.

tejaslodayadd commented 5 months ago

@kiview do you suggest any workaround for this? Thanks