quickfixgo / quickfix

The Go FIX Protocol Library :rocket:
https://www.quickfixgo.org/
Other
735 stars 288 forks source link

Fix stuck call to Dial when calling Stop on the Initiator #654

Closed abronan closed 1 month ago

abronan commented 2 months ago

This commit fixes an issue when calling Start() and then Stop() on the initiator while the connection is likely to fail and timeout. Sending a SIGTERM and calling Stop() will block since Dial will attempt to connect until it times out and returns on the waitForReconnectInterval call.

We mitigate this problem by using a proxy.ContextDialer and allowing to pass a context with cancellation method to the dialer.DialContext method on handleConnection.

We need to start a routine listening for the stopChan in order to call cancel() explicitly and thus exit the DialContext method.

Note: there are scenarios where cancel() will be called twice, this choice was made in order to avoid a larger refactor of the reconnect logic, but since the call to cancel() is idempotent, this doesn't lead to any adverse effect.

fixes #653

abronan commented 2 months ago

I have tested this locally using a program sending a SIGTERM and calling initiator.Stop() on a connection that fails, but would love to hear about an approach to test this properly within the repository. There didn't seem to be existing tests for initiator.go (unless I'm mistaken). As of now, it doesn't break any of the tests in dialer_test.go from what I could see.