python-zk / kazoo

Kazoo is a high-level Python library that makes it easier to use Apache Zookeeper.
https://kazoo.readthedocs.io
Apache License 2.0
1.3k stars 386 forks source link

Deadlock with read/write lock recipe #649

Closed westphahl closed 2 years ago

westphahl commented 3 years ago

Expected Behavior

Read/write lock recipe never deadlocks.

Actual Behavior

Read/write lock recipe deadlocks under certain conditions. The deadlock seems to happen as follows:

  1. write lock is acquired
  2. another component tries to acquire the read lock and blocks (watching the write lock as predecessor)
  3. yet another component tries to acquire the write lock and also blocks (watching the read lock as predecessor)
  4. the first write lock is released which wakes up the read lock
  5. the read lock starts watching the second write lock as predecessor :boom:

We now have a deadlock with the read lock and the second write lock waiting for each other to be "released".

There can be a variation of this when the second write lock's znode is created before the read lock calls: https://github.com/python-zk/kazoo/blob/6337fd6f72b59fb20886f980f2e0d6d41525dc35/kazoo/recipe/lock.py#L263

This will lead to the same deadlock condition with the read and the second write lock watching each other independent of the currently acquired write lock.

Snippet to Reproduce the Problem

import logging
logging.basicConfig(format="%(asctime)s %(levelname)s:%(message)s", level=logging.DEBUG)

import threading
import time

from kazoo.client import KazooClient

k = KazooClient()
k.start()

lock_path = "/lock"
writer_running = threading.Event()
reader_running = threading.Event()

def writer():
    logging.info("Writer running")
    writer_running.set()
    with k.WriteLock(lock_path):
        logging.info("Got write lock")

def reader():
    logging.info("Reader running")
    reader_running.set()
    with k.ReadLock(lock_path):
        logging.info("Got read lock")

writer_thread = threading.Thread(target=writer)
reader_thread = threading.Thread(target=reader)

with k.WriteLock(lock_path):
    logging.info("Starting reader")
    reader_thread.start()
    reader_running.wait()
    logging.info("Starting writer")
    writer_thread.start()
    writer_running.wait()
    logging.info("Waiting for locks to block")
    time.sleep(5)

logging.info("Outer lock released")
logging.info("Joining reader/writer threads ...")
reader_thread.join()
writer_thread.join()
logging.error("This will never show up")

Logs with logging in DEBUG mode

2021-07-30 12:06:08,586 INFO:Connecting to localhost(127.0.0.1):2281, use_ssl: True
2021-07-30 12:06:08,599 DEBUG:Sending request(xid=None): Connect(protocol_version=0, last_zxid_seen=0, time_out=10000, session_id=0, passwd=b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', read_only=None)
2021-07-30 12:06:08,600 INFO:Zookeeper connection established, state: CONNECTED
2021-07-30 12:06:08,600 DEBUG:Sending request(xid=1): Exists(path='/lock', watcher=None)
2021-07-30 12:06:08,601 DEBUG:Received response(xid=1): ZnodeStat(czxid=555413, mzxid=555413, ctime=1627635134011, mtime=1627635134011, version=0, cversion=96, aversion=0, ephemeralOwner=0, dataLength=0, numChildren=0, pzxid=555531)
2021-07-30 12:06:08,601 DEBUG:Sending request(xid=2): Create(path='/lock/2f779b4cbe444c31abb4c551f9f5e9c0__lock__', data=b'', acl=[ACL(perms=31, acl_list=['ALL'], id=Id(scheme='world', id='anyone'))], flags=3)
2021-07-30 12:06:08,601 DEBUG:Received response(xid=2): '/lock/2f779b4cbe444c31abb4c551f9f5e9c0__lock__0000000048'
2021-07-30 12:06:08,602 DEBUG:Sending request(xid=3): GetChildren(path='/lock', watcher=None)
2021-07-30 12:06:08,602 DEBUG:Received response(xid=3): ['2f779b4cbe444c31abb4c551f9f5e9c0__lock__0000000048']
2021-07-30 12:06:08,602 INFO:Starting reader
2021-07-30 12:06:08,602 INFO:Reader running
2021-07-30 12:06:08,603 INFO:Starting writer
2021-07-30 12:06:08,603 DEBUG:Sending request(xid=4): Exists(path='/lock', watcher=None)
2021-07-30 12:06:08,603 INFO:Writer running
2021-07-30 12:06:08,603 INFO:Waiting for locks to block
2021-07-30 12:06:08,603 DEBUG:Sending request(xid=5): Exists(path='/lock', watcher=None)
2021-07-30 12:06:08,603 DEBUG:Received response(xid=4): ZnodeStat(czxid=555413, mzxid=555413, ctime=1627635134011, mtime=1627635134011, version=0, cversion=97, aversion=0, ephemeralOwner=0, dataLength=0, numChildren=1, pzxid=555533)
2021-07-30 12:06:08,604 DEBUG:Received response(xid=5): ZnodeStat(czxid=555413, mzxid=555413, ctime=1627635134011, mtime=1627635134011, version=0, cversion=97, aversion=0, ephemeralOwner=0, dataLength=0, numChildren=1, pzxid=555533)
2021-07-30 12:06:08,604 DEBUG:Sending request(xid=6): Create(path='/lock/1e0d745a23fb459f918b4d247151525a__rlock__', data=b'', acl=[ACL(perms=31, acl_list=['ALL'], id=Id(scheme='world', id='anyone'))], flags=3)
2021-07-30 12:06:08,604 DEBUG:Sending request(xid=7): Create(path='/lock/ed26a149ee2947d9bc18df4df51359f4__lock__', data=b'', acl=[ACL(perms=31, acl_list=['ALL'], id=Id(scheme='world', id='anyone'))], flags=3)
2021-07-30 12:06:08,604 DEBUG:Received response(xid=6): '/lock/1e0d745a23fb459f918b4d247151525a__rlock__0000000049'
2021-07-30 12:06:08,604 DEBUG:Received response(xid=7): '/lock/ed26a149ee2947d9bc18df4df51359f4__lock__0000000050'
2021-07-30 12:06:08,604 DEBUG:Sending request(xid=8): GetChildren(path='/lock', watcher=None)
2021-07-30 12:06:08,604 DEBUG:Sending request(xid=9): GetChildren(path='/lock', watcher=None)
2021-07-30 12:06:08,605 DEBUG:Received response(xid=8): ['1e0d745a23fb459f918b4d247151525a__rlock__0000000049', 'ed26a149ee2947d9bc18df4df51359f4__lock__0000000050', '2f779b4cbe444c31abb4c551f9f5e9c0__lock__0000000048']
2021-07-30 12:06:08,605 DEBUG:Received response(xid=9): ['1e0d745a23fb459f918b4d247151525a__rlock__0000000049', 'ed26a149ee2947d9bc18df4df51359f4__lock__0000000050', '2f779b4cbe444c31abb4c551f9f5e9c0__lock__0000000048']
2021-07-30 12:06:08,605 DEBUG:Sending request(xid=10): GetData(path='/lock/ed26a149ee2947d9bc18df4df51359f4__lock__0000000050', watcher=<bound method Lock._watch_predecessor of <kazoo.recipe.lock.ReadLock object at 0x7f4f04a00c10>>)
2021-07-30 12:06:08,605 DEBUG:Sending request(xid=11): GetData(path='/lock/1e0d745a23fb459f918b4d247151525a__rlock__0000000049', watcher=<bound method Lock._watch_predecessor of <kazoo.recipe.lock.WriteLock object at 0x7f4f0405c250>>)
2021-07-30 12:06:08,605 DEBUG:Received response(xid=10): (b'', ZnodeStat(czxid=555535, mzxid=555535, ctime=1627639568604, mtime=1627639568604, version=0, cversion=0, aversion=0, ephemeralOwner=72060594463974197, dataLength=0, numChildren=0, pzxid=555535))
2021-07-30 12:06:08,605 DEBUG:Received response(xid=11): (b'', ZnodeStat(czxid=555534, mzxid=555534, ctime=1627639568604, mtime=1627639568604, version=0, cversion=0, aversion=0, ephemeralOwner=72060594463974197, dataLength=0, numChildren=0, pzxid=555534))
2021-07-30 12:06:13,607 DEBUG:Sending request(xid=12): Delete(path='/lock/2f779b4cbe444c31abb4c551f9f5e9c0__lock__0000000048', version=-1)
2021-07-30 12:06:13,609 DEBUG:Received response(xid=12): True
2021-07-30 12:06:13,610 INFO:Outer lock released
2021-07-30 12:06:13,610 INFO:Joining reader/writer threads ...

Specifications