nats-io / stan.net

The official NATS .NET C# Streaming Client
Apache License 2.0
137 stars 41 forks source link

Resolve deadlocks seen during rapid publishes #175 #176

Closed watfordgnf closed 4 years ago

watfordgnf commented 4 years ago

This PR attempts to resolve deadlocks seen during rapid, small publishes detailed in #175.

The root fix for the issue seen is to move code which accesses addLock within BlockingDictionary out from underneath dLock (09f6caa). On its own, this appears to resolve the deadlocks seen. Additionally, because pubAckMap never has its instance changed, the requirement for StanConnection.mu to be held goes away (89150f2). However, removing this lock requires that snapshot semantics be used when accessing pubAckMap.Keys (aaa05f8).

During testing, other deadlocks became apparent. If I increased the number of threads publishing and introduced random delays in some of the threads, if the STAN sever died unexpectedly there were deadlocks waiting for pubAckMap to become non-empty after ping timeouts (f52eeec). By introducing a timeout waiting for space in pubAckMap equal to the ping interval (38091ce), the STAN client became much more responsive to unexpected connection failures and did not deadlock under heavy load with inappropriate options (MaxPubAcksInFlight == 3).

watfordgnf commented 4 years ago

Closing to rebase against master.