Bug: Random failures on Redis Container when DinD + parallel

jmspereira commented 3 months ago

Describe the bug

Hey everyone,

I have a monorepo with dozens of small projects, and I am running tests that use a Redis Container in each project. To reduce the time to run the tests (since each project tests are somewhat CPU-light), I am using gnu parallel to run the tests of all projects in parallel, this causes situations where several containers are appearing/terminating simultaneously. I do not have any problems running this setup locally.

However, in my CI pipeline, each is built on top of a k8s cluster that uses dind to run the containers inside the container of the CI runner, the tests are incredibly flaky, even with multiple retries. The error is always something like this:

    def read_response(self, disable_decoding=False):
        if not self._reader:
            raise ConnectionError(SERVER_CLOSED_CONNECTION_ERROR)

        # _next_response might be cached from a can_read() call
        if self._next_response is not False:
            response = self._next_response
            self._next_response = False
            return response

        if disable_decoding:
            response = self._reader.gets(False)
        else:
            response = self._reader.gets()

        while response is False:
            self.read_from_socket()
            if disable_decoding:
                response = self._reader.gets(False)
            else:
>               response = self._reader.gets()
E               redis.exceptions.InvalidResponse: Protocol error, got "A" as reply type byte

I am not sure if this is an issue with testcontainers, but I already spent several hours trying to find the problem. Do you guys know what might be the source of this issue?

alexanderankin commented 3 months ago

is running in dind a sufficient condition to cause this error or does it go away when you stop running in parallel and run single threaded?

jmspereira commented 3 months ago

If I run all the tests sequentially, the problem does not exist. But again, I do not have this problem when running the tests with parallel in a local environment (without dind) :/

testcontainers / testcontainers-python

Bug: Random failures on Redis Container when DinD + parallel #511