populse / capsul

Collaborative Analysis Platform : Simple, Unifying, Lean
Other
7 stars 14 forks source link

unstabilities with redis connections #291

Closed denisri closed 10 months ago

denisri commented 10 months ago

Tests in capsul v3 often fail randomly with the following error:

/usr/local/lib/python3.10/dist-packages/redis/connection.py:260: in <lambda>
    lambda: self._connect(), lambda error: self.disconnect(error)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = UnixDomainSocketConnection<path=/tmp/capsul_engine_database.rdb.socket,db=0>

    def _connect(self):
        "Create a Unix domain socket connection"
        sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
        sock.settimeout(self.socket_connect_timeout)
>       sock.connect(self.path)
E       FileNotFoundError: [Errno 2] No such file or directory

/usr/local/lib/python3.10/dist-packages/redis/connection.py:803: FileNotFoundError

and:

        if self.redis.get('capsul:shutting_down'):
>           raise RuntimeError('Cannot connect to database because it is shutting down')
E           RuntimeError: Cannot connect to database because it is shutting down

database/redis.py:90: RuntimeError

Probably because the same file /tmp/capsul_engine_database.rdb.socket is reused. We should use real generated and unique temp filenames.

sapetnioc commented 10 months ago

This is not so trivial. More generally, it is a bit complex to know when to automatically create and destroy a database to cover all use cases. Info in config (including /tmp/capsul_engine_database.rdb in default config) is a meeting point for all database users. If a new path is used each time, there will be no more connection between various users (i.e. test script and corresponding workers). I suppose that the problem arise when a test create the database server and start workers. One of the worker will have the responsability to ask the server to stop, just before that, it put the server in the "shutting down" state. Since these two steps are not atomic, there is a possibility that the next test ask for database creation between them and fail.

There are several possibilities to solve this depending on what we expect from default database and how do we want to test database (using the default path or an explicit one).

denisri commented 10 months ago

I hope this is fixed by the last commit.