nats-io / nats.java

Java client for NATS
Apache License 2.0
569 stars 154 forks source link

Revisit Write Timeout Handling #1128

Closed scottf closed 5 months ago

scottf commented 6 months ago

1. Tuned implementation of forceReconnect.

2. Change the calling of the forceReconnect from the write timeout watch to use an executor thread instead of calling directly to so it doesn't run on the thread that will end up closing the watch's parent dataPort

3. Guards around connecting, meaning using a simple atomic flag, don't go into connect logic if already in connect logic. Probably not that necessary, but cost impact is minimal and will help any time multiple threads detect the same failure.

4. Smarter lock acquisition inside pushing a message to the outgoing queue, to avoid many threads needed losing parallelization because they had to wait for the lock.

5. Socket Write Timeout must be 100ms greater than the connection timeout. This may have been contributing to the cycle of failing to connect, but either way... The logic being that if the developer thinks it could take 10 seconds to connect and have set their connection timeout to account for that then it does not make sense to have a shorter write timeout. This is mostly applicable during the connection behavior and probably not as much of an issue when doing things when already connected, but I can't really know which situation we are in. Considering the connection timeout is on the order of seconds and the default socket write timeout is 2 minutes, ensuring the socket write timeout is at least 100ms more than the connection timeout seems like a reasonable thing to do.

6. The ping logic that runs on a timer in a separate thread could throw an exception. This is minor and will already be raised to the error listener, so I handled it instead of just letting it pass through to the timer thread, which is just noise.