This PR attempts to resolve deadlocks seen during rapid, small publishes detailed in #175.
The root fix for the issue seen is to move code which accesses addLock within BlockingDictionary out from underneath dLock (09f6caa). On its own, this appears to resolve the deadlocks seen. Additionally, because pubAckMap never has its instance changed, the requirement for StanConnection.mu to be held goes away (89150f2). However, removing this lock requires that snapshot semantics be used when accessing pubAckMap.Keys (aaa05f8).
During testing, other deadlocks became apparent. If I increased the number of threads publishing and introduced random delays in some of the threads, if the STAN sever died unexpectedly there were deadlocks waiting for pubAckMap to become non-empty after ping timeouts (f52eeec). By introducing a timeout waiting for space in pubAckMap equal to the ping interval (38091ce), the STAN client became much more responsive to unexpected connection failures and did not deadlock under heavy load with inappropriate options (MaxPubAcksInFlight == 3).
This PR attempts to resolve deadlocks seen during rapid, small publishes detailed in #175.
The root fix for the issue seen is to move code which accesses
addLock
withinBlockingDictionary
out from underneathdLock
(09f6caa). On its own, this appears to resolve the deadlocks seen. Additionally, becausepubAckMap
never has its instance changed, the requirement forStanConnection.mu
to be held goes away (89150f2). However, removing this lock requires that snapshot semantics be used when accessingpubAckMap.Keys
(aaa05f8).During testing, other deadlocks became apparent. If I increased the number of threads publishing and introduced random delays in some of the threads, if the STAN sever died unexpectedly there were deadlocks waiting for
pubAckMap
to become non-empty after ping timeouts (f52eeec). By introducing a timeout waiting for space inpubAckMap
equal to the ping interval (38091ce), the STAN client became much more responsive to unexpected connection failures and did not deadlock under heavy load with inappropriate options (MaxPubAcksInFlight == 3
).