shayhatsor / zookeeper

Apache ZooKeeper .NET async Client
https://nuget.org/packages/ZooKeeperNetEx/
Apache License 2.0
236 stars 53 forks source link

WriteLock may never be granted #26

Open bitchkat opened 6 years ago

bitchkat commented 6 years ago

This is a bug in the Java code but am also posting here in case you want to fix it in ZookeeperNetEx Recipes before its fixed in core.

On a busy system, I'm fairly frequently seeing WriteLock that is never granted to client and gets stuck.

What I believe is happening is the lock sets a watch on the request before him via this code:

I'm actually using the WriteLock from the ZookeeperNetEx C# code but I've verified that the same issue exists in the Java recipe. On a busy system, I'm fairly frequently seeing WriteLock that is never granted to client and gets stuck.

What I believe is happening is the lock sets a watch on the request before him via this code:

    Stat stat = await writeLock.zookeeper.existsAsync(writeLock.lastChildId, new LockWatcher(writeLock)).ConfigureAwait(false);
    if (stat != null)
    {
          return false;
    }

    LOG.warn("Could not find the" + " stats for less than me: " + lastChildName.Name);

The problem (as I see it and I'm still fairly new to Zookeeper) is that if the node represented by lastChildId has been deleted before the call to exists is made, stat will return null and the watch will only ever be invoked when the znode is created. And of course that will never happen.

The message is appearing in my log and my watcher for the lock is never invoked.

[2018-02-13 16:49:17.905 GMT WARNING WriteLock Could not find the stats for less than me: /token/SegmentProfileQueueToken/x-72057953399865370-0000000724]

I'm not entirely sure of the proper way of fixing this but I think setting Id = null; When stat is null should work.