rails-sqlserver / tiny_tds

TinyTDS - Simple and fast FreeTDS bindings for Ruby using DB-Library.
Other
607 stars 189 forks source link

Timeout does not work when tcp packets are dropped #445

Closed bugg2844 closed 3 years ago

bugg2844 commented 4 years ago

Scenario: Switching from a primary to a backup network, the firewall invalidates all existing tcp sessions and starts dropping packets.

Expected results: Configured timeout is hit and tiny_tds reports an error Actual results: No response from tiny_tds until underlying tcp connection's retries expire (> 15 minutes on Ubuntu, for example: https://pracucci.com/linux-tcp-rto-min-max-and-tcp-retries2.html)

Simple sample code attached in timeout.zip file. You'll need to simulate dropped packets in order to see the issue, which is shown about halfway down the attached freetds.log. Freetds reports a timeout in the log, over and over and over, but tiny_tds does not return any error to the caller. I am not sure if the issue is actually with tiny_tds or freetds.

Steps to reproduce:

  1. Prereqs
    1. A firewall is required that can be configured to drop packets, in order to simulate the condition. iptables is sufficient (e.g. apt-get install iptables), or pfctl on Mac.
    2. Configure freetds to debug: dump file = /tmp/freetds.log debug flags = 0xffff
  2. Extract timeout.zip
  3. Modify main.rb for your database connection
  4. bundle install
  5. bundle exec ruby main.rb
    1. You'll need to complete step 6 while main.rb is running. Modify the loop as required.
  6. Interrupt the database connection:
    1. For linux: iptables -I INPUT -s 1.2.3.4 -j DROP (where 1.2.3.4 is your db's ip address)
    2. For mac:
      1. Append to /etc/pf.conf (or create a custom pf.conf): block drop from 1.2.3.4 to any (where 1.2.3.4 is your db's ip address)
      2. sudo pfctl /etc/pf.conf

At this point you should see the repeatedly ignored timeouts in freetds.log. On Linux this will continue for over 15 minutes before the connection finally times out. On Mac it takes around 30 seconds.

You can restore your connection by replacing DROP with ACCEPT for iptables, or by removing the extra line from pf.conf and running pfctl again.

freetds.log timeout.zip

aharpervc commented 4 years ago

Interesting, definitely seems like a bug at first glace. From your log, the timeout error is 20003, which corresponds to SYBETIME, which has special consideration here: https://github.com/rails-sqlserver/tiny_tds/blob/master/ext/tiny_tds/client.c#L83-L91

Random idea: the error is getting handled by tinytds, and stored in the nonblocking_error struct in userdata. However, since the connection is effectively dead, there's no subsequent step that calls nogvl_cleanup which would cause the error to throw an exception in ruby.

You could test this idea by adding some logging to tinytds_err_handler to confirm that tiny tds is receiving the error in the first place and then from there, not raising it as an exception. Then maybe we can figure out a way to have it safely do that.

bugg2844 commented 4 years ago

Thanks. I did as you suggested, and it looks like what you described is occurring. Here are three iterations of logs I added to tinytds_err_handler.

main.rb:15: warning: Entering tinytds_err_handler main.rb:15: warning: dberr is SYBETIME main.rb:15: warning: userdata->nonblocking is true main.rb:15: warning: db is not dead, userdata is not closed; calling dbcancel and setting dbcandel_sent main.rb:15: warning: nonblocking_error is not set

main.rb:15: warning: Entering tinytds_err_handler main.rb:15: warning: dberr is SYBETIME main.rb:15: warning: userdata->nonblocking is true main.rb:15: warning: db is not dead, userdata is not closed; calling dbcancel and setting dbcandel_sent main.rb:15: warning: Nonblocking error is already set

main.rb:15: warning: Entering tinytds_err_handler main.rb:15: warning: dberr is SYBETIME main.rb:15: warning: userdata->nonblocking is true main.rb:15: warning: db is not dead, userdata is not closed; calling dbcancel and setting dbcandel_sent main.rb:15: warning: Nonblocking error is already set