testcontainers / testcontainers-dotnet

A library to support tests with throwaway instances of Docker containers for all compatible .NET Standard versions.
https://dotnet.testcontainers.org
MIT License
3.67k stars 254 forks source link

[Bug]: Container hangs waiting for startup due to logs #1035

Closed mausch closed 8 months ago

mausch commented 8 months ago

Testcontainers version

3.5.0

Using the latest Testcontainers version?

Yes

Host OS

Linux

Host arch

x86

.NET version

6.0.414

Docker version

Client:
 Version:           20.10.25
 API version:       1.41
 Go version:        go1.20.8
 Git commit:        v20.10.25
 Built:             Thu Jan  1 00:00:00 1970
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.25
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.20.8
  Git commit:       v20.10.25
  Built:            Tue Jan  1 00:00:00 1980
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.7.1
  GitCommit:        v1.7.1
 runc:
  Version:          1.1.5
  GitCommit:        
 docker-init:
  Version:          0.19.0
  GitCommit:

Docker info

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc., v0.10.4)
  compose: Docker Compose (Docker Inc., 2.18.1)

Server:
 Containers: 1
  Running: 1
  Paused: 0
  Stopped: 0
 Images: 26
 Server Version: 20.10.25
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: journald
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: v1.7.1
 runc version: 
 init version: 
 Security Options:
  seccomp
   Profile: default
  cgroupns
 Kernel Version: 5.10.199
 Operating System: NixOS 23.05 (Stoat)
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 31.03GiB
 Name: RYOGA
 ID: 3B5S:ZIAN:TKIC:7JOU:XYLM:FTRC:3IUW:TDVE:7TDQ:2UBI:BRNQ:HKJQ
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Username: mausch
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: true

What happened?

Sample code from https://dotnet.testcontainers.org/modules/postgres/ hangs. I've stepped through it and it turns out that somehow the Testcontainers lib can't get the container logs, they all come back as an empty string even though I can view them with docker logs.

Here's a trivial example. If you override the default wait strategy that checks for logs it doesn't hang (I'm aware that checking only for the port is incorrect).

using Testcontainers.PostgreSql;

var testContainer = new PostgreSqlBuilder()
    .WithWaitStrategy(Wait.ForUnixContainer().UntilPortIsAvailable(5432)) // comment this out and it hangs
    .Build();
await testContainer.StartAsync();

Seems to be some breaking change in docker?

I tried starting the container directly with Docker.DotNet and I was able to get the logs just fine though...

Relevant log output

No response

Additional information

No response

HofmeisterAn commented 8 months ago

I cannot reproduce it with the latest version, nor can I find any release notes (changes) that may be causing an issue here. It appears that you are using an older Docker Engine version (20.10). Could you please try the latest version, 24.0?

I tried starting the container directly with Docker.DotNet and I was able to get the logs just fine though...

Collecting the logs is implemented in a straightforward manner. There isn't anything particularly special about what we do here. We set the log filter to min-max (by default) and then get the logs from the container.

mausch commented 8 months ago

I think I've found the cause of the issue. When getting the logs the timestamps are undefined at the callsite. This leads to https://github.com/testcontainers/testcontainers-dotnet/blob/082eea4f33ad7bd88e435ed2572b2393e8a0bc34/src/Testcontainers/Clients/TestcontainersClient.cs#L99-L118 setting Since = DateTime.MinValue - epoch which is -62135596740 . For some reason any negative value in this parameter leads to getting no logs at all. With Since = 0 all logs are fetched just fine.

Here's a code snippet that replaces the default WaitUntil and shows what works and what doesn't:

public sealed class PostgreSqlContainerTest : IAsyncLifetime
{
    private sealed class WaitUntil : IWaitUntil
    {
        private static readonly string[] LineEndings = { "\r\n", "\n" };

        /// <inheritdoc />
        public async Task<bool> UntilAsync(IContainer container)
        {
             // this works
             var client = new DockerClientConfiguration().CreateClient();
             var logsParameters = new ContainerLogsParameters
             {
                 ShowStdout = true,
                 ShowStderr = true,
                 Since = "0",
                 Until = "253402300800",
             };
            using var logs = await client.Containers.GetContainerLogsAsync(container.Id, false, logsParameters);
            var (stdout, stderr) = await logs.ReadOutputToEndAsync(default);

            // this can't get logs
            // var client = new DockerClientConfiguration().CreateClient();
            // var logsParameters = new ContainerLogsParameters
            // {
            //     ShowStdout = true,
            //     ShowStderr = true,
            //     Since = "-1",
            //     Until = "253402300800",
            // };
            // using var logs = await client.Containers.GetContainerLogsAsync(container.Id, false, logsParameters);
            // var (stdout, stderr) = await logs.ReadOutputToEndAsync(default);

            //this hangs because it always gets empty logs (default WaitUntil used in PostgreSqlBuilder)
            //var (stdout, stderr) = await container.GetLogsAsync(timestampsEnabled: false);

            // this hangs, it uses the same arguments as the default WaitUntil
            // var (stdout, stderr) = await container.GetLogsAsync(
            //     timestampsEnabled: false, 
            //     since: DateTime.MinValue, 
            //     until: DateTime.MaxValue
            // );

            // works
            // var (stdout, stderr) = await container.GetLogsAsync(
            //     timestampsEnabled: false, 
            //     since: DateTime.UnixEpoch, 
            //     until: DateTime.MaxValue
            // );

            // works
            // var (stdout, stderr) = await container.GetLogsAsync(
            //     timestampsEnabled: false, 
            //     since: DateTime.UtcNow.Subtract(TimeSpan.FromHours(1)), 
            //     until: DateTime.UtcNow.Add(TimeSpan.FromHours(1))
            // );

            return 2.Equals(Array.Empty<string>()
                .Concat(stdout.Split(LineEndings, StringSplitOptions.RemoveEmptyEntries))
                .Concat(stderr.Split(LineEndings, StringSplitOptions.RemoveEmptyEntries))
                .Count(line => line.Contains("database system is ready to accept connections")));
        }
    }

    private readonly PostgreSqlContainer _postgreSqlContainer = new PostgreSqlBuilder()
        .WithWaitStrategy(Wait.ForUnixContainer().AddCustomWaitStrategy(new WaitUntil()))
        .Build();

    public Task InitializeAsync()
    {
        return _postgreSqlContainer.StartAsync();
    }

    public Task DisposeAsync()
    {
        return _postgreSqlContainer.DisposeAsync().AsTask();
    }

    [Fact]
    public void ExecuteCommand()
    {
        using DbConnection connection = new NpgsqlConnection(_postgreSqlContainer.GetConnectionString());
        using DbCommand command = new NpgsqlCommand();
        connection.Open();
        command.Connection = connection;
        command.CommandText = "SELECT 1";
    }
}
mausch commented 8 months ago

Could you please try the latest version, 24.0

Docker 24 didn't change this behaviour:

Client:
 Version:    24.0.5
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.11.2
    Path:     /nix/store/c9dxg39wxk8g77bzanmxxyag6gvml24r-docker-plugins/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  2.21.0
    Path:     /nix/store/c9dxg39wxk8g77bzanmxxyag6gvml24r-docker-plugins/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 1
  Running: 0
  Paused: 0
  Stopped: 1
 Images: 26
 Server Version: 24.0.5
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: journald
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: v1.7.5
 runc version: 
 init version: 
 Security Options:
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 5.10.199
 Operating System: NixOS 23.05 (Stoat)
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 31.03GiB
 Name: RYOGA
 ID: 41ad67c0-e890-4f4a-b020-06f2b6238ff2
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Username: mausch
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: true
mausch commented 8 months ago

For some reason any negative value in this parameter leads to getting no logs at all.

BTW I'm not 100% sure if this is mapped as-is but the docker CLI also complains about negative values:

$ docker logs my-container --since "-62135596740"
invalid value for "since": parsing time "-62135596740" as "2006-01-02": cannot parse "-62135596740" as "2006"

$ docker logs my-container --since "-1"
invalid value for "since": parsing time "-1" as "2006-01-02": cannot parse "-1" as "2006"

$ docker logs my-container --since "0"
<all_logs>
HofmeisterAn commented 8 months ago

Indeed 🤔 a negative value looks wrong. I do not remember why I choose DateTime.MinValue and DateTime.MaxValue anymore. Without taking a closer look, I would expect unixEpoch to be the default value. Then since and until will correspond to 0 seconds (which is the default value according to the docs). Good catch 👀 thanks. Would you mind creating a PR?