testcontainers / testcontainers-java

Testcontainers is a Java library that supports JUnit tests, providing lightweight, throwaway instances of common databases, Selenium web browsers, or anything else that can run in a Docker container.
https://testcontainers.org
MIT License
7.91k stars 1.62k forks source link

[Bug]: Finalizing JUnit 5 test case with Kafka Streams and Spring Boot takes several minutes #8867

Closed MarvinVaillant closed 1 week ago

MarvinVaillant commented 2 weeks ago

Module

Kafka

Testcontainers version

1.19.8

Using the latest Testcontainers version?

Yes

Host OS

MacOS, Linux, Windows

Host Arch

ARM, x86

Docker version

This is my local setup with Podman. In the pipeline we use Docker and other colleagues use Docker Desktop where the issue also happens.

Client: Docker Engine - Community
 Version:           27.0.3
 API version:       1.41 (downgraded from 1.46)
 Go version:        go1.22.4
 Git commit:        7d4bcd863a
 Built:             Fri Jun 28 14:56:30 2024
 OS/Arch:           darwin/arm64
 Context:           default

Server: linux/arm64/fedora-40
 Podman Engine:
  Version:          5.0.3
  APIVersion:       5.0.3
  Arch:             arm64
  BuildTime:        2024-05-10T02:00:00+02:00
  Experimental:     false
  GitCommit:
  GoVersion:        go1.22.2
  KernelVersion:    6.8.8-300.fc40.aarch64
  MinAPIVersion:    4.0.0
  Os:               linux
 Conmon:
  Version:          conmon version 2.1.10, commit:
  Package:          conmon-2.1.10-1.fc40.aarch64
 OCI Runtime (crun):
  Version:          crun version 1.14.4
commit: a220ca661ce078f2c37b38c92e66cf66c012d9c1
rundir: /run/crun
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  Package:          crun-1.14.4-1.fc40.aarch64
 Engine:
  Version:          5.0.3
  API version:      1.41 (minimum version 1.24)
  Go version:       go1.22.2
  Git commit:
  Built:            Fri May 10 02:00:00 2024
  OS/Arch:          linux/arm64
  Experimental:     false

What happened?

We're running a Spring Boot project with Kafka Streams. We have one integration test (using @SpringBootTest and @Testcontainers) that sometimes takes several minutes to complete after all tests successfully ran. Lots of Kafka "no connection to node" and "Node 1 disconnected" are printed in the logs repeatedly.

It seems like the Kafka container is stopped before the Spring context is shutdown. And the Spring Context is waiting for Kafka to become available before shutting down.

Relevant log output

### The following 20 lines are printed repeatedly for several minutes
[11:00:35.147] [INFO] [org.apache.kafka.clients.NetworkClient] - [Consumer clientId=<our-consumer-group>-StreamThread-1-consumer, groupId=streams.notification-report-selector.notification-repartitioner] Node 1 disconnected.
[11:00:35.147] [WARN] [org.apache.kafka.clients.NetworkClient] - [Consumer clientId=<our-consumer-group>-StreamThread-1-consumer, groupId=streams.notification-report-selector.notification-repartitioner] Connection to node 1 (localhost/127.0.0.1:32772) could not be established. Node may not be available.
[11:00:35.218] [INFO] [org.apache.kafka.clients.NetworkClient] - [AdminClient clientId=<our-consumer-group>-admin] Node 1 disconnected.
[11:00:35.218] [WARN] [org.apache.kafka.clients.NetworkClient] - [AdminClient clientId=<our-consumer-group>-admin] Connection to node 1 (localhost/127.0.0.1:32772) could not be established. Node may not be available.
[11:00:35.350] [INFO] [org.apache.kafka.clients.NetworkClient] - [Producer clientId=producer-3] Node 1 disconnected.
[11:00:35.350] [WARN] [org.apache.kafka.clients.NetworkClient] - [Producer clientId=producer-3] Connection to node 1 (localhost/127.0.0.1:32772) could not be established. Node may not be available.
[11:00:35.591] [INFO] [org.apache.kafka.clients.NetworkClient] - [Producer clientId=<our-consumer-group>-StreamThread-1-producer] Node 1 disconnected.
[11:00:35.591] [WARN] [org.apache.kafka.clients.NetworkClient] - [Producer clientId=<our-consumer-group>-StreamThread-1-producer] Connection to node 1 (localhost/127.0.0.1:32772) could not be established. Node may not be available.
[11:00:35.782] [INFO] [org.apache.kafka.clients.NetworkClient] - [Producer clientId=producer-1] Node 1 disconnected.
[11:00:35.782] [WARN] [org.apache.kafka.clients.NetworkClient] - [Producer clientId=producer-1] Connection to node 1 (localhost/127.0.0.1:32772) could not be established. Node may not be available.
[11:00:36.072] [INFO] [org.apache.kafka.clients.NetworkClient] - [Producer clientId=producer-2] Node 1 disconnected.
[11:00:36.072] [WARN] [org.apache.kafka.clients.NetworkClient] - [Producer clientId=producer-2] Connection to node 1 (localhost/127.0.0.1:32772) could not be established. Node may not be available.
[11:00:36.148] [INFO] [org.apache.kafka.clients.NetworkClient] - [Consumer clientId=<our-consumer-group>-StreamThread-1-consumer, groupId=[<our-consumer-group>]] Node 1 disconnected.
[11:00:36.148] [WARN] [org.apache.kafka.clients.NetworkClient] - [Consumer clientId=<our-consumer-group>-StreamThread-1-consumer, groupId=[<our-consumer-group>]] Connection to node 1 (localhost/127.0.0.1:32772) could not be established. Node may not be available.
[11:00:36.221] [INFO] [org.apache.kafka.clients.NetworkClient] - [AdminClient clientId=<our-consumer-group>-admin] Node 1 disconnected.
[11:00:36.221] [WARN] [org.apache.kafka.clients.NetworkClient] - [AdminClient clientId=<our-consumer-group>-admin] Connection to node 1 (localhost/127.0.0.1:32772) could not be established. Node may not be available.
[11:00:36.370] [INFO] [org.apache.kafka.clients.NetworkClient] - [Producer clientId=producer-3] Node 1 disconnected.
[11:00:36.371] [WARN] [org.apache.kafka.clients.NetworkClient] - [Producer clientId=producer-3] Connection to node 1 (localhost/127.0.0.1:32772) could not be established. Node may not be available.
[11:00:36.544] [INFO] [org.apache.kafka.clients.NetworkClient] - [Producer clientId=<our-consumer-group>-StreamThread-1-producer] Node 1 disconnected.
[11:00:36.544] [WARN] [org.apache.kafka.clients.NetworkClient] - [Producer clientId=<our-consumer-group>-StreamThread-1-producer] Connection to node 1 (localhost/127.0.0.1:32772) could not be established. Node may not be available.
[11:00:36.731] [INFO] [org.apache.kafka.clients.admin.internals.AdminMetadataManager] - [AdminClient clientId=<our-consumer-group>-admin] Metadata update failed
org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: fetchMetadata
[11:00:36.804] [INFO] [org.apache.kafka.clients.NetworkClient] - [Producer clientId=producer-1] Node 1 disconnected.
[11:00:36.804] [WARN] [org.apache.kafka.clients.NetworkClient] - [Producer clientId=producer-1] Connection to node 1 (localhost/127.0.0.1:32772) could not be established. Node may not be available.
[11:00:36.925] [INFO] [org.apache.kafka.clients.NetworkClient] - [Producer clientId=producer-2] Node 1 disconnected.
[11:00:36.925] [WARN] [org.apache.kafka.clients.NetworkClient] - [Producer clientId=producer-2] Connection to node 1 (localhost/127.0.0.1:32772) could not be established. Node may not be available.

### The following log lines are only printed once at when the tests are finalized

Error: 1:00:36.961] [ERROR] [org.apache.kafka.streams.processor.internals.TaskExecutor] - stream-thread [<our-consumer-group>-StreamThread-1] Committing task(s) 0_1 failed.
org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired before successfully committing offsets {notification.managed-1=OffsetAndMetadata{offset=5, leaderEpoch=null, metadata='AgAAAZCWstBN'}}
Error: 1:00:36.961] [ERROR] [org.apache.kafka.streams.processor.internals.TaskManager] - stream-thread [<our-consumer-group>-StreamThread-1] Exception caught while committing tasks [TaskId: 0_1
    ProcessorTopology:
        Notifications:
            topics:     [notification.managed]
            children:   [Event-Type-Filter]
        Event-Type-Filter:
            children:   [Key-Selector]
        Key-Selector:
            children:   [Notifications-By-Id]
        Notifications-By-Id:
            topic:      StaticTopicNameExtractor(notification.by-id)
Partitions [notification.managed-1]
]
org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired before successfully committing offsets {notification.managed-1=OffsetAndMetadata{offset=5, leaderEpoch=null, metadata='AgAAAZCWstBN'}}
[11:00:36.961] [INFO] [org.apache.kafka.streams.processor.internals.StreamTask] - stream-thread [<our-consumer-group>-StreamThread-1] task [0_0] Suspended from RUNNING
[11:00:36.962] [INFO] [org.apache.kafka.streams.processor.internals.RecordCollectorImpl] - stream-thread [<our-consumer-group>-StreamThread-1] stream-task [0_0] Closing record collector dirty
[11:00:36.962] [INFO] [org.apache.kafka.streams.processor.internals.StreamTask] - stream-thread [<our-consumer-group>-StreamThread-1] task [0_0] Closed dirty
[11:00:36.962] [INFO] [org.apache.kafka.streams.processor.internals.StreamTask] - stream-thread [<our-consumer-group>-StreamThread-1] task [0_1] Suspended from RUNNING
[11:00:36.963] [INFO] [org.apache.kafka.streams.processor.internals.RecordCollectorImpl] - stream-thread [<our-consumer-group>-StreamThread-1] stream-task [0_1] Closing record collector dirty
[11:00:36.963] [INFO] [org.apache.kafka.streams.processor.internals.StreamTask] - stream-thread [<our-consumer-group>-StreamThread-1] task [0_1] Closed dirty
[11:00:36.963] [INFO] [org.apache.kafka.clients.producer.KafkaProducer] - [Producer clientId=<our-consumer-group>-StreamThread-1-producer] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms.
[11:00:36.966] [INFO] [org.apache.kafka.common.metrics.Metrics] - Metrics scheduler closed
[11:00:36.966] [INFO] [org.apache.kafka.common.metrics.Metrics] - Closing reporter org.apache.kafka.common.metrics.JmxReporter
[11:00:36.966] [INFO] [org.apache.kafka.common.metrics.Metrics] - Closing reporter org.apache.kafka.common.telemetry.internals.ClientTelemetryReporter
[11:00:36.966] [INFO] [org.apache.kafka.common.metrics.Metrics] - Metrics reporters closed
[11:00:36.967] [INFO] [org.apache.kafka.common.utils.AppInfoParser] - App info kafka.producer for <our-consumer-group>-StreamThread-1-producer unregistered
Error: 1:00:36.967] [ERROR] [org.apache.kafka.streams.processor.internals.StreamThread] - stream-thread [<our-consumer-group>-StreamThread-1] Failed to close task manager due to the following error:
org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired before successfully committing offsets {notification.managed-1=OffsetAndMetadata{offset=5, leaderEpoch=null, metadata='AgAAAZCWstBN'}}
[11:00:36.967] [INFO] [org.apache.kafka.clients.consumer.internals.LegacyKafkaConsumer] - [Consumer clientId=<our-consumer-group>-StreamThread-1-restore-consumer, groupId=null] Unsubscribed all topics or patterns and assigned partitions
[11:00:36.967] [INFO] [org.apache.kafka.common.metrics.Metrics] - Metrics scheduler closed
[11:00:36.967] [INFO] [org.apache.kafka.common.metrics.Metrics] - Closing reporter org.apache.kafka.common.metrics.JmxReporter
[11:00:36.967] [INFO] [org.apache.kafka.common.metrics.Metrics] - Closing reporter org.apache.kafka.common.telemetry.internals.ClientTelemetryReporter
[11:00:36.967] [INFO] [org.apache.kafka.common.metrics.Metrics] - Metrics reporters closed
[11:00:36.970] [INFO] [org.apache.kafka.common.utils.AppInfoParser] - App info kafka.consumer for <our-consumer-group>-StreamThread-1-consumer unregistered
[11:00:36.970] [INFO] [org.apache.kafka.common.metrics.Metrics] - Metrics scheduler closed
[11:00:36.970] [INFO] [org.apache.kafka.common.metrics.Metrics] - Closing reporter org.apache.kafka.common.metrics.JmxReporter
[11:00:36.970] [INFO] [org.apache.kafka.common.metrics.Metrics] - Closing reporter org.apache.kafka.common.telemetry.internals.ClientTelemetryReporter
[11:00:36.970] [INFO] [org.apache.kafka.common.metrics.Metrics] - Metrics reporters closed
[11:00:36.972] [INFO] [org.apache.kafka.common.utils.AppInfoParser] - App info kafka.consumer for <our-consumer-group>-StreamThread-1-restore-consumer unregistered
[11:00:36.973] [INFO] [org.apache.kafka.streams.processor.internals.StreamThread] - stream-thread [<our-consumer-group>-StreamThread-1] State transition from PENDING_SHUTDOWN to DEAD
[11:00:36.973] [INFO] [org.apache.kafka.streams.processor.internals.StreamThread] - stream-thread [<our-consumer-group>-StreamThread-1] Shutdown complete
[11:00:36.973] [INFO] [org.apache.kafka.streams.KafkaStreams] - stream-client [<our-consumer-group>] Shutdown 1 stream threads complete
[11:00:36.973] [INFO] [org.apache.kafka.common.utils.AppInfoParser] - App info kafka.admin.client for <our-consumer-group>-admin unregistered
[11:00:36.973] [INFO] [org.apache.kafka.clients.admin.internals.AdminMetadataManager] - [AdminClient clientId=<our-consumer-group>-admin] Metadata update failed
org.apache.kafka.common.errors.TimeoutException: The AdminClient thread has exited. Call: fetchMetadata
[11:00:36.973] [INFO] [org.apache.kafka.clients.admin.KafkaAdminClient] - [AdminClient clientId=<our-consumer-group>-admin] Timed out 1 remaining operation(s) during close.
[11:00:36.984] [INFO] [org.apache.kafka.common.metrics.Metrics] - Metrics scheduler closed
[11:00:36.984] [INFO] [org.apache.kafka.common.metrics.Metrics] - Closing reporter org.apache.kafka.common.metrics.JmxReporter
[11:00:36.984] [INFO] [org.apache.kafka.common.metrics.Metrics] - Metrics reporters closed
[11:00:36.984] [INFO] [org.apache.kafka.common.metrics.Metrics] - Metrics scheduler closed
[11:00:36.984] [INFO] [org.apache.kafka.common.metrics.Metrics] - Closing reporter org.apache.kafka.common.metrics.JmxReporter
[11:00:36.984] [INFO] [org.apache.kafka.common.metrics.Metrics] - Metrics reporters closed
[11:00:36.985] [INFO] [org.apache.kafka.streams.KafkaStreams] - stream-client [<our-consumer-group>] State transition from PENDING_SHUTDOWN to NOT_RUNNING
[11:00:36.985] [INFO] [org.apache.kafka.streams.KafkaStreams] - stream-client [<our-consumer-group>] Streams client stopped completely
[11:00:37.006] [INFO] [org.apache.kafka.streams.KafkaStreams] - stream-client [<our-consumer-group>] Streams client is already in the terminal NOT_RUNNING state, all resources are closed and the client has stopped.
[11:00:37.007] [INFO] [org.apache.kafka.streams.KafkaStreams] - stream-client [<our-consumer-group>] Streams client is already in the terminal NOT_RUNNING state, all resources are closed and the client has stopped.
[11:00:37.007] [INFO] [org.apache.kafka.streams.KafkaStreams] - stream-client [<our-consumer-group>] Streams client is already in the terminal NOT_RUNNING state, all resources are closed and the client has stopped.
[11:00:37.007] [INFO] [org.apache.kafka.streams.KafkaStreams] - stream-client [<our-consumer-group>] Streams client is already in the terminal NOT_RUNNING state, all resources are closed and the client has stopped.
[11:00:37.011] [INFO] [org.apache.kafka.streams.KafkaStreams] - stream-client [<our-consumer-group>] Streams client is already in the terminal ERROR state, all resources are closed and the client has stopped.
[11:00:37.011] [INFO] [org.apache.kafka.streams.KafkaStreams] - stream-client [<our-consumer-group>] Streams client is already in the terminal ERROR state, all resources are closed and the client has stopped.
[11:00:37.011] [INFO] [org.apache.kafka.streams.KafkaStreams] - stream-client [<our-consumer-group>] Streams client is already in the terminal ERROR state, all resources are closed and the client has stopped.
[11:00:37.011] [INFO] [org.apache.kafka.streams.KafkaStreams] - stream-client [<our-consumer-group>] Streams client is already in the terminal ERROR state, all resources are closed and the client has stopped.
[11:00:37.016] [INFO] [org.apache.kafka.streams.KafkaStreams] - stream-client [<our-consumer-group>] Streams client is already in the terminal ERROR state, all resources are closed and the client has stopped.
[11:00:37.016] [INFO] [org.apache.kafka.streams.KafkaStreams] - stream-client [<our-consumer-group>] Streams client is already in the terminal ERROR state, all resources are closed and the client has stopped.
[11:00:37.016] [INFO] [org.apache.kafka.streams.KafkaStreams] - stream-client [<our-consumer-group>] Streams client is already in the terminal ERROR state, all resources are closed and the client has stopped.
[11:00:37.016] [INFO] [org.apache.kafka.streams.KafkaStreams] - stream-client [<our-consumer-group>] Streams client is already in the terminal ERROR state, all resources are closed and the client has stopped.
[11:00:37.028] [INFO] [org.apache.kafka.streams.KafkaStreams] - stream-client [<our-consumer-group>] Streams client is already in the terminal ERROR state, all resources are closed and the client has stopped.
[11:00:37.028] [INFO] [org.apache.kafka.streams.KafkaStreams] - stream-client [<our-consumer-group>] Streams client is already in the terminal ERROR state, all resources are closed and the client has stopped.
[11:00:37.029] [INFO] [org.apache.kafka.streams.KafkaStreams] - stream-client [<our-consumer-group>] Streams client is already in the terminal ERROR state, all resources are closed and the client has stopped.
[11:00:37.029] [INFO] [org.apache.kafka.streams.KafkaStreams] - stream-client [<our-consumer-group>] Streams client is already in the terminal ERROR state, all resources are closed and the client has stopped.
[11:00:37.044] [INFO] [org.apache.kafka.streams.KafkaStreams] - stream-client [<our-consumer-group>] Streams client is already in the terminal ERROR state, all resources are closed and the client has stopped.
[11:00:37.044] [INFO] [org.apache.kafka.streams.KafkaStreams] - stream-client [<our-consumer-group>] Streams client is already in the terminal ERROR state, all resources are closed and the client has stopped.
[11:00:37.044] [INFO] [org.apache.kafka.streams.KafkaStreams] - stream-client [<our-consumer-group>] Streams client is already in the terminal ERROR state, all resources are closed and the client has stopped.
[11:00:37.044] [INFO] [org.apache.kafka.streams.KafkaStreams] - stream-client [<our-consumer-group>] Streams client is already in the terminal ERROR state, all resources are closed and the client has stopped.

Additional Information

The issue seems to be mitigated to some extend (we wait < 10 seconds for the test to finalize) if we start the container manually in the @BeforeAll method, instead of using the @Testcontainers and @Container annotations.

MarvinVaillant commented 2 weeks ago

Does anyone have a clue why this can happen? Or how I can investigate further?

eddumelendez commented 2 weeks ago

Hi, do you have a project to reproduce it?

MarvinVaillant commented 2 weeks ago

I'll try to provide you with a minimal example today.

MarvinVaillant commented 1 week ago

Sorry for the slow response. Here is the example repo: https://github.com/MarvinVaillant/demo-long-kafka-streams-teardown/tree/main

MarvinVaillant commented 1 week ago

Here you'll also find a short video that shows the issue: https://github.com/user-attachments/assets/535847ac-cbbc-4f72-b6c9-a09e682bf84e

eddumelendez commented 1 week ago

@MarvinVaillant thanks for providing the example but I am not able to run the project. It fails because of Mapped port can only be obtained after the container is started. I can make some changes but I can not see the behavior described.

MarvinVaillant commented 1 week ago

i'm really sorry. I was fiddeling around with the code to get some more insights. Could you please pull the latest commits?

eddumelendez commented 1 week ago

Thanks for updating it. I'm moving this to a discussion because It is not an issue itself. Testcontainers is creating, running the container and killing it. However, in order to close the context you should call close explicitly or rather use @DirtiesContext. I do not have much experience on Kafka Streams but others can help on how to handle this.