Closed nmoroze closed 7 months ago
As an update, I've noticed that the amount of time the log consumer works for is the same as the "log production timeout" setting (e.g. by default, the log consumer stops working after five seconds).
Is this what the log production timeout setting is intended to control? It's not super clear to me from the docs.
Your repo tar is no longer present, but #2346 might fix this issue as it refactors how logging start / stop is handled eliminating a race and subsequent deadlock. If you could test @nmoroze that would be great.
Hi @stevenh, thanks for pointing me to your PR. Unfortunately I don't think it solves my particular problem, the issue persists even using your branch.
I've re-attached the container tarball for repro to this comment in case you or anyone else is interested.
I tried to repo locally but can't, here my archive: log_repo.tgz
# build
(cd container; docker build -t localhost/log_repro:latest .)
# run test
go test -v -race
Output:
=== RUN TestLogIssue
2024/03/13 18:28:51 github.com/testcontainers/testcontainers-go - Connected to docker:
Server Version: 25.0.3
API Version: 1.44
Operating System: Docker Desktop
Total Memory: 15836 MB
Resolved Docker Host: unix:///var/run/docker.sock
Resolved Docker Socket Path: /var/run/docker.sock
Test SessionID: ce94f6e5446cf4dac898cbe3506dbc726ab8896b36336393c1055fccee365497
Test ProcessID: a04b6431-4468-45ed-8c50-ab58d412f0cf
2024/03/13 18:28:51 🐳 Creating container for image testcontainers/ryuk:0.6.0
2024/03/13 18:28:51 ✅ Container created: b7c0742de4e8
2024/03/13 18:28:51 🐳 Starting container: b7c0742de4e8
2024/03/13 18:28:52 ✅ Container started: b7c0742de4e8
2024/03/13 18:28:52 🚧 Waiting for container id b7c0742de4e8 image: testcontainers/ryuk:0.6.0. Waiting for: &{Port:8080/tcp timeout:<nil> PollInterval:100ms}
2024/03/13 18:28:52 🔔 Container is ready: b7c0742de4e8
2024/03/13 18:28:52 🐳 Creating container for image localhost/log_repro:latest
2024/03/13 18:28:52 ✅ Container created: 839d3fd30990
2024/03/13 18:28:52 🐳 Starting container: 839d3fd30990
2024/03/13 18:28:53 ✅ Container started: 839d3fd30990
2024/03/13 18:28:53 🔔 Container is ready: 839d3fd30990
2024/03/13 18:28:53 tick 0
2024/03/13 18:28:54 tick 1
2024/03/13 18:28:55 tick 2
2024/03/13 18:28:56 tick 3
2024/03/13 18:28:57 tick 4
2024/03/13 18:28:58 tick 5
2024/03/13 18:28:59 tick 6
2024/03/13 18:29:00 tick 7
2024/03/13 18:29:01 tick 8
2024/03/13 18:29:02 tick 9
2024/03/13 18:29:03 tick 10
test over
2024/03/13 18:29:03 🐳 Terminating container: 839d3fd30990
2024/03/13 18:29:03 🚫 Container terminated: 839d3fd30990
--- PASS: TestLogIssue (11.80s)
PASS
ok log-test 12.844s
I tried adding ProviderType: tc.ProviderPodman,
to the request but still seems to want to use docker under WSL.
Tried getting a podman working but its just way to flaky, vm totally hangs and then just wont work for me on WSL:
API forwarding for Docker API clients is not available due to the following startup failures.
could not start api proxy since expected pipe is not available: podman-machine-default
Finally got podman working after a kernel upgrade but still can't reproduce:
=== RUN TestLogIssue
2024/03/13 20:22:37 github.com/testcontainers/testcontainers-go - Connected to docker:
Server Version: 4.9.3
API Version: 1.41
Operating System: fedora
Total Memory: 15836 MB
Resolved Docker Host: unix:///run/user/1000/podman/podman.sock
Resolved Docker Socket Path: /run/user/1000/podman/podman.sock
Test SessionID: c63aa4e8af35d970a47b07194bf27bbbaaac5dfb929678b458c1e515fc310db4
Test ProcessID: ab2fd2cd-f65f-4e99-9e3a-958ccd19d80b
2024/03/13 20:22:37 🐳 Creating container for image testcontainers/ryuk:0.6.0
2024/03/13 20:22:37 ✅ Container created: 1db0a21708f0
2024/03/13 20:22:37 🐳 Starting container: 1db0a21708f0
2024/03/13 20:22:38 ✅ Container started: 1db0a21708f0
2024/03/13 20:22:38 🚧 Waiting for container id 1db0a21708f0 image: testcontainers/ryuk:0.6.0. Waiting for: &{Port:8080/tcp timeout:<nil> PollInterval:100ms}
2024/03/13 20:22:38 🔔 Container is ready: 1db0a21708f0
2024/03/13 20:22:38 🐳 Creating container for image localhost/log_repro:latest
2024/03/13 20:22:38 ✅ Container created: 4e83a135fa90
2024/03/13 20:22:38 🐳 Starting container: 4e83a135fa90
2024/03/13 20:22:38 ✅ Container started: 4e83a135fa90
2024/03/13 20:22:38 🔔 Container is ready: 4e83a135fa90
2024/03/13 20:22:38 tick 0
2024/03/13 20:22:39 tick 1
2024/03/13 20:22:40 tick 2
2024/03/13 20:22:41 tick 3
2024/03/13 20:22:42 tick 4
2024/03/13 20:22:44 tick 6
2024/03/13 20:22:45 tick 7
2024/03/13 20:22:46 tick 8
2024/03/13 20:22:47 tick 9
test over
2024/03/13 20:22:48 🐳 Terminating container: 4e83a135fa90
2024/03/13 20:22:50 🚫 Container terminated: 4e83a135fa90
--- PASS: TestLogIssue (12.54s)
PASS
ok log-test 12.546s
Thanks for the effort to reproduce this. I noticed you're using a newer version of Podman based on your latest logs, I'll try that out on my end and see if it helps (the reason for the old version I'm using is it's the default available on apt
for Ubuntu 22.04).
@stevenh Looks like my reproducible example works fine with Podman v4.9.3 both with and without your change.
Unfortunately, after updating, the codebase where I originally found this bug exhibits different buggy behavior. Using testcontainers
0.29.1, it appears to hang indefinitely trying to terminate a container. Using your branch, the Terminate()
calls to several containers seem to timeout with this error:
Get "http://%2Frun%2Fuser%2F2109978069%2Fpodman%2Fpodman.sock/v1.41/containers/cc50050f7d744f2d2ae77d0df598f087f1b74b06e2a66dfd537dd12b289b745d/logs?follow=1&since=1710436901.565258953&stderr=1&stdout=1&tail=": context deadline exceeded
I can see if I can come up with a reproducible example for this new issue.
Alright, I was able to reproduce this with a pretty simple example. I've attached a new image tar, this container runs a binary that does nothing and simply busy-loops forever.
The test code itself is also fairly minimal:
package container_test
import (
"context"
"fmt"
"testing"
tc "github.com/testcontainers/testcontainers-go"
)
type StdoutLogConsumer struct{}
func (lc *StdoutLogConsumer) Accept(l tc.Log) {
fmt.Printf("%s", l.Content)
}
func TestLogIssue(t *testing.T) {
req := tc.ContainerRequest{
Image: "localhost/log_repro:latest",
LogConsumerCfg: &tc.LogConsumerConfig{
Consumers: []tc.LogConsumer{&StdoutLogConsumer{}},
},
}
ctx := context.Background()
container, err := tc.GenericContainer(ctx, tc.GenericContainerRequest{
ContainerRequest: req,
Started: true,
})
if err != nil {
t.Fatal(err)
}
t.Cleanup(func() {
if err := container.Terminate(ctx); err != nil {
t.Fatal(err)
}
})
fmt.Println("test over")
}
The output I get:
2024/03/15 20:47:46 github.com/testcontainers/testcontainers-go - Connected to docker:
Server Version: 4.9.3
API Version: 1.41
Operating System: ubuntu
Total Memory: 127946 MB
Resolved Docker Host: unix:///run/user/2109978069/podman/podman.sock
Resolved Docker Socket Path: /run/user/2109978069/podman/podman.sock
Test SessionID: 61f8be8f5fb4a6f37c09cd82a69ea05022e3a674641b2add09d5c5593b17f0c2
Test ProcessID: 9ffc22d9-7015-4555-be43-c7ebd115a846
2024/03/15 20:47:46 🐳 Creating container for image testcontainers/ryuk:0.6.0
2024/03/15 20:47:46 ✅ Container created: 1c9f5ce3ad48
2024/03/15 20:47:46 🐳 Starting container: 1c9f5ce3ad48
2024/03/15 20:47:46 ✅ Container started: 1c9f5ce3ad48
2024/03/15 20:47:46 🚧 Waiting for container id 1c9f5ce3ad48 image: testcontainers/ryuk:0.6.0. Waiting for: &{Port:8080/tcp timeout:<nil> PollInterval:100ms}
2024/03/15 20:47:46 🔔 Container is ready: 1c9f5ce3ad48
2024/03/15 20:47:46 🐳 Creating container for image localhost/log_repro:latest
2024/03/15 20:47:46 ✅ Container created: b3b72d3709fe
2024/03/15 20:47:46 🐳 Starting container: b3b72d3709fe
2024/03/15 20:47:46 ✅ Container started: b3b72d3709fe
2024/03/15 20:47:46 🔔 Container is ready: b3b72d3709fe
test over
2024/03/15 20:47:46 🐳 Terminating container: b3b72d3709fe
--- FAIL: TestLogIssue (6.78s)
container_test.go:36: Get "http://%2Frun%2Fuser%2F2109978069%2Fpodman%2Fpodman.sock/v1.41/containers/b3b72d3709fed603ed1ddc15159307171130ac77a9981ee86610cbde2e05970f/logs?follow=1&stderr=1&stdout=1&tail=": context deadline exceeded
FAIL
Can you post the build for the container so we don't have to rely on images?
@stevenh Sure - I just wrote up a build file and simple Makefile to run everything. I've attached the sources as a tarball, untar and run make
to reproduce.
Thanks @nmoroze looks like a bug in podman to me, if the container doesn't generate any log entries the request for log entries just hangs with podman where as docker returns an empty log.
I would suggest raising an issue with the podman team.
Thanks for taking a look! That makes sense, I'll close this issue for now and follow up on that side.
Testcontainers version
0.29.1
Using the latest Testcontainers version?
Yes
Host OS
Linux
Host arch
x86
Go version
1.22
Docker version
Docker info
What happened?
I've found that
LogConsumer
s sometimes stop receiving logs prematurely and dump an error.I've built a minimal reproducible example of the specific issue. It consists of a very small container image based on
distroless/base
. The container executes the following program:I've attached a prebuilt container to the issue as a tarball.
In order to reproduce the issue, first load the container tarball like so:
And then compile and run the following test:
I'd expect this test to print container logs showing 9-10 ticks and exit cleanly. Instead, it consistently stops displaying output after 6 ticks, and always outputs an error message:
I did notice some interesting things about this while trying to debug, but I'm not sure what to make of them. In the main loop of
docker.startLogProduction()
, it seems like the"use of closed network connection"
error block consistently triggers before the EOF error is triggered. In addition, it seems like attempting to restart the connection when EOF is received allows the log consumer to get a few more logs, e.g.:Relevant log output
No response
Additional information
container.tar.gz