python-trio / trio

Trio – a friendly Python library for async concurrency and I/O
https://trio.readthedocs.io
Other
6.12k stars 335 forks source link

Provide Read-Write lock ? #528

Open touilleMan opened 6 years ago

touilleMan commented 6 years ago

Related to #19

I've implemented a reader-writer lock (following https://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock#Implementation) for my own needs.

@attr.s(slots=True)
class ReadLockManager:

    rwlock = attr.ib()

    async def __aenter__(self):
        async with self.rwlock.r:
            self.rwlock.b += 1
            if self.rwlock.b == 1:
                await self.rwlock.g.acquire()
                # Don't worry, I known what I'm doing...
                self.rwlock.g._owner = 42

    async def __aexit__(self, *args):
        async with self.rwlock.r:
            self.rwlock.b -= 1
            if self.rwlock.b == 0:
                # Don't worry, I known what I'm doing...
                self.rwlock.g._owner = trio._core.current_task()
                self.rwlock.g.release()

@attr.s(slots=True)
class WriteLockManager:

    rwlock = attr.ib()

    async def __aenter__(self):
        await self.rwlock.g.acquire()

    async def __aexit__(self, *args):
        self.rwlock.g.release()

@attr.s(slots=True)
class RWLock:
    r = attr.ib(default=attr.Factory(trio.Lock))
    g = attr.ib(default=attr.Factory(trio.Lock))
    b = attr.ib(default=0)

    def acquire_read(self):
        return ReadLockManager(self)

    def acquire_write(self):
        return WriteLockManager(self)

The codebase is really straightforward, except for the fact the g lock is released from a different coroutine that it was acquired in the first place. This obliged me to hack the g._owner field...

This make me wonder if it wouldn't be better to have this kind part of trio directly (given no normal user should be expected to use this kind of hack on a 3rd party dependency). Beside it's a fairly common kind of lock so DRY should apply here.

sorcio commented 6 years ago

It sounds tricky to guarantee some kind of fairness here. Would you mind me asking about your use case? In many cases atomicity of writes (e.g. on some complex data structure) can be enforced by not having any checkpoint within the write operation (so no task can start reading while we are writing). The point of a read-write lock is to increase concurrency of read operations, but in a cooperative concurrent model this could be translated in different ways depending on the use case.

I tried to implement a variant of the lock without hacks. It uses the trio.hazmat API which, despite the name, is fully documented and has the same deprecation guarantees as the rest of Trio's API. Note that it could be slower, and it definitely has a different priority policy, than the one you implemented. This is mostly just a POC that a RW lock could be implemented without hacks :)

(You can also play with using two different parking lots for the readers and the writers, and wake them according with some specific policy, or use unpark_all() which, if I understand correctly, would lose the FIFO guarantee)

@attr.s()
class RWLock2:
    _lot = attr.ib(default=attr.Factory(trio.hazmat.ParkingLot))
    _read_lock_taken = attr.ib(default=0)
    _global_lock_taken = attr.ib(default=False)

    @asynccontextmanager
    async def acquire_read(self):
        while self._global_lock_taken:
            await self._lot.park()
        self._read_lock_taken += 1
        yield
        self._read_lock_taken -= 1
        if not self._read_lock_taken:
            self._lot.unpark()
        await trio.sleep(0)

    @asynccontextmanager
    async def acquire_write(self):
        while self._global_lock_taken or self._read_lock_taken:
            await self._lot.park()
        self._global_lock_taken = True
        yield
        self._global_lock_taken = False
        self._lot.unpark()
        await trio.sleep(0)
touilleMan commented 6 years ago

@sorcio thanks a lot for you non-hacky rwlock implementation ;-)

Would you mind me asking about your use case?

Have a look here https://github.com/Scille/parsec-cloud/blob/54c34a3cdf69e4735e681a4ecf6c767eef57a0df/parsec/core/fs/folder.py#L193 I'm building a fuse-based filesystem with cloud synchronization. The idea is to have a local version of your files (that can be only a part of your entire files tree and can be access/modified offline) and a remote one.

I use rw-lock here to lock a single file/folder given working on it can mean accessing to remote data. For instance reading a file means we must fetch the metadata (that may or may not be available locally) then use those info to fetch the data that are splitted in multiple chunks. In the meantime a synchronization process can occur which aim at uploading a new version of file metadata and data chunks.

In many cases atomicity of writes (e.g. on some complex data structure) can be enforced by not having any checkpoint within the write operation

Since the post of this issue, I've rewritten my code following this idea to get rid of the rwlocks. The idea was to make copy of data before doing any await and rely on merge algorithm (see https://github.com/Scille/parsec-cloud/blob/8ec6d260852971e8379d230f4614a3b5f2b08bc6/parsec/core/fs2/__init__.py#L449).

I tried to implement a variant of the lock without hack

In the end I'm a bit puzzled by this rwlock thing. On one hand it's a very common feature and implementing it is not trivial (cf. this issue and it hacky implementation, and @sorcio much better implementation but which use the hazmat module) On the other hand like you said fairness is complex to achieve so one-size-fits-all maybe hard to achieve here...