Recently we had a problem with a 3 node Cassandra cluster running 3.11.4 where multiple threads were attempting to lock the ReentrantReadWriteLock in TaskQueueAsync (via the submitAsynchronous() method). In looking at the output from jstack, there were 63 threads attempting to acquire the lock, but no thread held it. Similarly, the lucene-indexer-1 threads (which serve as the executors behind the lock) in the same stack were all idle.
Looking at the code in question (which seems unchanged since 2016), it shouldn't be possible for it to fail to unlock unless the thread holding the lock was interrupted. I also suspect that the original author was a bit overzealous as nothing on the inside of the submitAsynchronous method isn't thread safe (except the passed variable "id"). Is there some reason that I am not seeing where we need to be so protective of the contents? Maybe just not try and lock at all in that function?
Recently we had a problem with a 3 node Cassandra cluster running 3.11.4 where multiple threads were attempting to lock the ReentrantReadWriteLock in TaskQueueAsync (via the submitAsynchronous() method). In looking at the output from jstack, there were 63 threads attempting to acquire the lock, but no thread held it. Similarly, the lucene-indexer-1 threads (which serve as the executors behind the lock) in the same stack were all idle.
Looking at the code in question (which seems unchanged since 2016), it shouldn't be possible for it to fail to unlock unless the thread holding the lock was interrupted. I also suspect that the original author was a bit overzealous as nothing on the inside of the submitAsynchronous method isn't thread safe (except the passed variable "id"). Is there some reason that I am not seeing where we need to be so protective of the contents? Maybe just not try and lock at all in that function?