Closed jamis closed 9 months ago
Hi Jamis, is there a succinct snippet from the Ruby driver that implements this behavior that you can share for this PR? Would a test for this behavior be feasible? I'm much more inclined to having testable behavior in a specification.
Hi Jamis, is there a succinct snippet from the Ruby driver that implements this behavior that you can share for this PR? Would a test for this behavior be feasible? I'm much more inclined to having testable behavior in a specification.
Hey Kyle, thanks for taking a look at the PR. The following example from the Ruby driver is taken from code that works with the underlying network socket, and replaces network errors with a driver-specific exception class, and annotates them with (among other things) the server address:
rescue Errno::ETIMEDOUT => e
raise Error::SocketTimeoutError, "#{e.class}: #{e} (for #{human_address})"
rescue IOError, SystemCallError => e
raise Error::SocketError, "#{e.class}: #{e} (for #{human_address})"
rescue OpenSSL::SSL::SSLError => e
raise Error::SocketError, "#{e.class}: #{e} (for #{human_address})"
I share your reluctance to include a spec update without tests, but:
Ultimately, I'm not sure that adding mandated tests to the spec for this behavior can or should be done in a way that excludes the drivers for which this does not apply, and manages to cover the potential errors that this relates to. I'm open to discussing it, though, if you have thoughts on how this ought to be done. (This is my first rodeo with writing a spec change, so I'm sure there's plenty I could be doing better, here!)
Some drivers (e.g. Ruby) attach server information to certain errors for diagnostic purposes. This PR updates the language in the retryable reads and writes specifications to require that these drivers ensure that the server information that is associated with an exception reflects the server that actually originated the exception, even in the presence of one more more retries.
Please complete the following before merging:
NOTE: Tests for this situation (server information associated with exceptions during a retry) have proven challenging to write robustly. TL;DR -- I don't have a good solution for it, and currently cannot claim to have reliably tested this, though the code in the Ruby driver has been present since mid-2020.
I've found the most success with subscribing to command events, waiting for (e.g.) a 'find' event, and then forcing the primary to step down (per https://www.mongodb.com/docs/manual/tutorial/force-member-to-be-primary/) so that the retry hits a different server. I also capture logging output in the test, so I can then compare a message logged for the first exception (with the first server) with the message of the exception raised when the retry fails, and confirm that the servers differ. This mostly works, but (1) it only works for replica sets, and (2) it seems to fail non-deterministically, apparently due to some race condition associated with the setting of a failpoint.