Open n-a-sz opened 4 years ago
Hello,
This is due to the retry. It seems to me this should be solved by https://github.com/python-zk/kazoo/pull/578 where the only whole seconds while retrying
was identified.
Can you share a snippet?
Thanks.
Hi StephenSorriaux, thanks for the fast reply.
I have further debugged it and could reproduce it by not adding a starting slash to the path. In this case, it adds a starting slash to the path, but it cannot find the node first, but will retry again and will find it on the second time after waiting a bit.
Here is the code to reproduce:
import time
from kazoo.client import KazooClient
zk = KazooClient(hosts=ZOO_KEEPER_HOSTS)
zk.start()
times = []
for i in range(0,5):
# read_lock = zk.ReadLock('/lock-test-xyz/test-resource', 'xzy') # works without problem
read_lock = zk.ReadLock('lock-test-xyz/test-resource', 'xzy') # this line throws an exception inside, but works after kazoo retries
start = time.time()
read_lock.acquire()
times.append(time.time() - start)
read_lock.release()
zk.stop()
zk.close()
print(times)
Annotating this line with prints, that throws the exception, I got this:
With starting slash:
getting index for 8c148f894f394fb3abe2f69f79a3a1e3__rlock__0000000076
children: ['8c148f894f394fb3abe2f69f79a3a1e3__rlock__0000000076']
Without starting slash:
getting index for /d6895f8e89214abfb98c180e60473af8__rlock__0000000081
children: ['d6895f8e89214abfb98c180e60473af8__rlock__0000000081']
... waiting 0 or 1 sec
getting index for d6895f8e89214abfb98c180e60473af8__rlock__0000000081
children: ['d6895f8e89214abfb98c180e60473af8__rlock__0000000081']
So it's probably my bad not giving a starting slash. But without it does it make any sense? If not, kazoo could add it for me.
I'm using the read-write lock API of kazoo 2.6.1 and noticed it randomly takes one sec longer to acquire a free lock, while it sends and receives the very same messages to Zookeeper
I have found, that it always hits this line where it decides to sleep 0 or 1 seconds, and I cannot modify this behavior.
Sleeping 0 seconds:
Sleeping 1 seconds:
As you can see, there is no difference in the communication, only in sleep time. It looks like a bug to me. Also, I cannot understand why it sends the very same GetChildren request 3 times? Is it because I have a three node cluster?