Closed bugg2844 closed 3 years ago
Interesting, definitely seems like a bug at first glace. From your log, the timeout error is 20003, which corresponds to SYBETIME, which has special consideration here: https://github.com/rails-sqlserver/tiny_tds/blob/master/ext/tiny_tds/client.c#L83-L91
Random idea: the error is getting handled by tinytds, and stored in the nonblocking_error
struct in userdata. However, since the connection is effectively dead, there's no subsequent step that calls nogvl_cleanup
which would cause the error to throw an exception in ruby.
You could test this idea by adding some logging to tinytds_err_handler
to confirm that tiny tds is receiving the error in the first place and then from there, not raising it as an exception. Then maybe we can figure out a way to have it safely do that.
Thanks. I did as you suggested, and it looks like what you described is occurring. Here are three iterations of logs I added to tinytds_err_handler.
main.rb:15: warning: Entering tinytds_err_handler main.rb:15: warning: dberr is SYBETIME main.rb:15: warning: userdata->nonblocking is true main.rb:15: warning: db is not dead, userdata is not closed; calling dbcancel and setting dbcandel_sent main.rb:15: warning: nonblocking_error is not set
main.rb:15: warning: Entering tinytds_err_handler main.rb:15: warning: dberr is SYBETIME main.rb:15: warning: userdata->nonblocking is true main.rb:15: warning: db is not dead, userdata is not closed; calling dbcancel and setting dbcandel_sent main.rb:15: warning: Nonblocking error is already set
main.rb:15: warning: Entering tinytds_err_handler main.rb:15: warning: dberr is SYBETIME main.rb:15: warning: userdata->nonblocking is true main.rb:15: warning: db is not dead, userdata is not closed; calling dbcancel and setting dbcandel_sent main.rb:15: warning: Nonblocking error is already set
Scenario: Switching from a primary to a backup network, the firewall invalidates all existing tcp sessions and starts dropping packets.
Expected results: Configured timeout is hit and tiny_tds reports an error Actual results: No response from tiny_tds until underlying tcp connection's retries expire (> 15 minutes on Ubuntu, for example: https://pracucci.com/linux-tcp-rto-min-max-and-tcp-retries2.html)
Simple sample code attached in timeout.zip file. You'll need to simulate dropped packets in order to see the issue, which is shown about halfway down the attached freetds.log. Freetds reports a timeout in the log, over and over and over, but tiny_tds does not return any error to the caller. I am not sure if the issue is actually with tiny_tds or freetds.
Steps to reproduce:
At this point you should see the repeatedly ignored timeouts in freetds.log. On Linux this will continue for over 15 minutes before the connection finally times out. On Mac it takes around 30 seconds.
You can restore your connection by replacing DROP with ACCEPT for iptables, or by removing the extra line from pf.conf and running pfctl again.
freetds.log timeout.zip