harlowja / fasteners

A python package that provides useful locks.
Apache License 2.0
243 stars 45 forks source link

Potential deadlock in ReaderWriterLock #86

Closed gthiemonge closed 2 years ago

gthiemonge commented 2 years ago

This behavior has been observed in the OpenStack TaskFlow project (https://opendev.org/openstack/taskflow)

After updating from fasteners 0.14.1 to 0.16.3, we have observed some issues with the TaskFlow CI, some tests are blocked with a deadlock.

I wrote a simple reproducer in the fasteners test: https://github.com/gthiemonge/fasteners/commit/6d40296baa2c870a1ea8746bc7ad0f581c6ec4c5

Basically, a first thread requests a writer lock, then a reentrant reader lock, while a second thread requests only the writer lock. The test deadlocks after a few iterations:

$ pytest -k test_deadlock_reproducer -v -s tests/test_lock.py
============================================================================================================= test session starts =============================================================================================================
platform linux -- Python 3.10.1, pytest-6.2.4, py-1.11.0, pluggy-0.13.1 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /home/user/sources/fasteners
collected 18 items / 17 deselected / 1 selected                                                                                                                                                                                               

tests/test_lock.py::test_deadlock_reproducer thread1 iteration: 0
thread1 waiting for write lock
thread1 got write lock
thread1 waiting for read lock
thread1 got read lock
thread1 released read lock
thread1 released write lock
thread1 iteration: 1
thread1 waiting for write lock
thread1 got write lock
thread1 waiting for read lock
thread1 got read lock
thread1 released read lock
thread1 released write lock
thread1 iteration: 2
thread1 waiting for write lock
thread1 got write lock
thread1 waiting for read lock
thread2 iteration: 0
thread2 waiting for write lock
thread1 got read lock
thread1 released read lock
thread1 released write lock
thread2 got write lock
thread1 iteration: 3
thread2 released write lock
thread2 iteration: 1
thread1 waiting for write lock
thread2 waiting for write lock
thread1 got write lock
thread1 waiting for read lock
^C

The following commit https://github.com/harlowja/fasteners/commit/75cf59a7246383d44f617215fbdf818d2e88a399 triggers this behavior, reverting the commit seems to fix the issue.

Any ideas on how to fix it?

psarka commented 2 years ago

Thanks for a great bug report! I'm unfamiliar with this part of the codebase, so it will take some time. @obondarev, maybe you would like to take a look?

obondarev commented 2 years ago

Sure, I think read lock should be allowed if current thread already has a write lock. This means line #184 in [1] should be "elif (self._writer == me) or not self.has_pending_writers:" Can you please try that @gthiemonge ?

[1] https://github.com/harlowja/fasteners/commit/75cf59a7246383d44f617215fbdf818d2e88a399

psarka commented 2 years ago

I gave it a go, all tests pass, including the reproducer by @gthiemonge. Thanks :+1:

I will take another carefull look at that part of the code and will release the fix soonish.

obondarev commented 2 years ago

Thanks, @psarka !

gthiemonge commented 2 years ago

Sure, I think read lock should be allowed if current thread already has a write lock. This means line #184 in [1] should be "elif (self._writer == me) or not self.has_pending_writers:" Can you please try that @gthiemonge ?

[1] 75cf59a

Yes, it fixes my reproducer and our tests in TaskFlow. Thanks Folks!

psarka commented 2 years ago

Made a new release (0.17.3) with the fix.

jakirkham commented 2 years ago

We are still seeing deadlocks with 0.17.3. Will try to come up with a reproducer.