Closed augustj closed 10 years ago
Have you tried the latest master branch?
Yes - I have tried against master. As well as the fork w/ pull request on https://github.com/veracross/tiny_tdsfrom wbond that fixes some timeout issues.
On Wed, Oct 16, 2013 at 4:59 PM, Ken Collins notifications@github.comwrote:
Have you tried the latest master branch?
— Reply to this email directly or view it on GitHubhttps://github.com/rails-sqlserver/tiny_tds/issues/138#issuecomment-26469625 .
Yes - I have tried against master. As well as the fork w/ pull request on...
Did you mean the work under #136 which I have not yet merged? Just wanted to be sure. If so, that was my next recommendation. I have heard about this on the FreeTDS list before too, could this be an issue with them? I have also heard about it on standard SQL Server over a general network or VPN. Those discussions went towards keep alive settings. Maybe give this issue a read over and let me know what you think?
Yes, I have tried that (#136) as well. Unfortunately, the connection is also closed during active use - not just inactivity. I will look into the FreeTDS list and see if I find anything. ... I didn't see anything in the FreeTDS archives, so I posted to them. I'll let you know what I hear.
@augustj Do you have a small test case that could easily reproduce this? I'm seeing this issue in prod as well.
As far as a test case, I was never able to get a reproducible case - we ended up forking tiny_tds and putting in some hardcoded behavior to not keep retrying on timeouts. We then forked activerecord-retry to look for that specific error and retry. It was a terrible solution - one that left a bad taste in my mouth. That project was put on hold for other reasons, so I never pursued it. My conclusion was that it wasn't necessarily a tiny_tds issue, but wackyness in Azure SQL. Sorry I don't have better news.
I know people are using this gem in Azure - I wonder if they are using 'real' SQLServer instances instead of Azure SQL.
@augustj Sorry to hear about that. If it is something in TinyTDS, would love to get it fixed.
I am not sure it is a tiny_tds issue. If I remember correctly, Azure SQL reports a timeout - but it is effectively a dead connection.
I've dug into this a little today, I don't believe its any problem with Azure SQL, but for some reason, a dead connection from any SQL Server is reported as a timeout. You need to actually check DEADPROC in the err_handler and then return INT_EXIT instead
That makes sense - although I confess that my knowledge of SQL Server connections is scant. With that extra piece of info about DEADPROC, it seems like there could be a concrete test case - and a solution. From my reading on MS docs about Azure SQL, they say that it is basically flaky and could cut out at any time - and provide code/solutions in .Net to make catching/retrying easy.
On Mon, Dec 9, 2013 at 1:55 PM, John Anderson notifications@github.comwrote:
I've dug into this a little today, I don't believe its any problem with Azure SQL, but for some reason, a dead connection from any SQL Server is reported as a timeout. You need to actually check DEADPROC in the err_handler and then return INT_EXIT instead
— Reply to this email directly or view it on GitHubhttps://github.com/rails-sqlserver/tiny_tds/issues/138#issuecomment-30177688 .
Our other issue was that a connection from the pool would only appear dead once we tried to use it. That was why we ended up using the activerecord-retry gem. If there is was to check the connection on checkout from the pool, that would help. I know that is often frowned upon, but Azure SQL connection die so often, they should be checked every time.
On Mon, Dec 9, 2013 at 1:59 PM, August Jaenicke august@carbonfive.comwrote:
That makes sense - although I confess that my knowledge of SQL Server connections is scant. With that extra piece of info about DEADPROC, it seems like there could be a concrete test case - and a solution. From my reading on MS docs about Azure SQL, they say that it is basically flaky and could cut out at any time - and provide code/solutions in .Net to make catching/retrying easy.
On Mon, Dec 9, 2013 at 1:55 PM, John Anderson notifications@github.comwrote:
I've dug into this a little today, I don't believe its any problem with Azure SQL, but for some reason, a dead connection from any SQL Server is reported as a timeout. You need to actually check DEADPROC in the err_handler and then return INT_EXIT instead
— Reply to this email directly or view it on GitHubhttps://github.com/rails-sqlserver/tiny_tds/issues/138#issuecomment-30177688 .
We also seem to be suffering from this. Any updates?
@augustj, any chance you can post what you did? We are still suffering from this and struggling to find a solution.
@Altonymous - my two hacks still did not result in a useable system. The project was shelved for other reasons. However, if the project was to continue, we would have needed to find a real solution. My hacks only improved the experience from unusable to merely excruciating. I'd recommend pursuing @sontek's hypothesis.
I checked out a copy of the activerecord-sqlserver-adapter.. and it looks like it's getting to SQLServerAdapter->active? and when it calls @connection.active? that always returns true. Even when the connection is not active.
I have to run my kid to soccer practice.. otherwise I'd try and dig into the tiny_tds gem to see why that is. Perhaps someone more experienced in the tiny_tds would be able to take it from here. If not I'll be back at it after soccer practice.
For anyone else having this problem, that just needs a work-around for the time being, I just created an initializer to keep the connection alive...
# db_keep_alive.rb
thr = Thread.new {
loop do
ActiveRecord::Base.connection_pool.with_connection do |conn|
User.first
end
sleep 30
end
}
at_exit do
thr.kill
end
It's ugly, but it works.
I am seeing this issue with Azure SQL + FreeTDS 7.1 + tiny_tds 0.6.2 (built using master on April 23 2014) which includes #136 running on an Azure hosted Suse instance. To reproduce just leave an Azure Sql connection idle for a few minutes using ActiveRecord then make a call.
I see the same messages @augustj reported; the debug messages freetds.log show 20003: "Adaptive Server connection timed out"
@sontek suggestion about returning INT_EXIT on SYBETIME (20003) is probably a good one. At least it might be the only option for Azure Sql. Could SYBETIME be recovered from with a normal non-Azure connection?
#define SYBEFCON 20002 /* SQL Server connection failed. */
#define SYBETIME 20003 /* SQL Server connection timed out. */
Taking out the logic for handling SYBETIME in the query timeout case and forcing INT_EXIT in client.c improves the behavior for this Azure Sql case.
case SYBETIME: {
// Existing userdata->continue logic taken out
return_value = INT_EXIT;
break;
This changes the behavior to the following in the freetds.log
dblib.c:4669:dbsqlok() not done, calling tds_process_tokens()
token.c:540:tds_process_tokens(0x5a39ee0, 0x7fc8086ba144, 0x7fc8086ba140, 0x6914)
util.c:156:Changed query state from PENDING to READING
net.c:1199:in tds_pull_func
util.c:331:tdserror(0x4b67d50, 0x5a39ee0, 20003, 0)
dblib.c:7929:dbperror(0x45806c0, 20003, 0)
dblib.c:7981:20003: "Adaptive Server connection timed out"
dblib.c:5780:dbgetuserdata(0x45806c0)
dblib.c:8002:"Adaptive Server connection timed out", client returns 0 (INT_EXIT)
dblib.c:8050:FreeTDS: db-lib: exiting because client error handler returned INT_EXIT for msgno 20003
But there is still a timeout delay before ActiveRecord connects. It is a much shorter delay than 5~10 minutes but still not great. If we knew the connection is always going to do this on Azure Sql then it might be a valid change for the Azure Sql case.
Created pull request #158 with a change in behavior for SYBETIME(20003) if the connection uses Azure SQL. I tested this using an existing rails Azure SQL application - but not tested with rake yet.
Created pull request #159 which is less disruptive to the process and allows the rails server to reconnect and continue. I have no access to a Sql Server test environment at the moment so the change is not tested with rake yet.
I just made a release at https://github.com/veracross/tiny_tds/releases/tag/0.6.2-beta1 which should include a fix for this. Can you guys try it out and let me know how it works?
Thanks Will. I will hopefully have an environment to test it out today. Stay tuned...
This should be resolved by the 0.6.2 release that I just pushed up to rubygems.
i'm not able actually to install the pre beta gem i have this message LoadError cannot load such file -- tiny_tds
i've install the pre beta version but i'm still facing timeout issues
@lethunder The pre-beta is no longer necessary. Version 0.6.2 was pushed to rubygems around May 19th.
i've tested both i still have timeout issues :(
@lethunder Sounds like you should open a new issue and include info such as:
With that info, someone may be able to help you. It may, however, come down to diving in a figuring it out yourself.
I'm still having time out issues after 0.6.2 here as well. Here's my info:
freetds (freetds-dev package on ubuntu 12.04.4 LTS): 0.91-1 tiny_tds: 0.6.2 ubuntu: 12.04.4 LTS ruby: 2.1.2p95 (2014-05-08 revision 45877) [x86_64-linux]
windows: Microsoft Windows Server 2012 R2 sql server: SQL Server Standard 2012 Service Pack 1
@D1plo1d This is a closed issue. I'd probably recommend opening a new one with a FreeTDS dump.
Then you'll just have to wait for someone to spend the time to dig through it all.
Hi,
Sorry to post on a closed issue (but this seems like relevant to what we see).
A few days ago, we saw a series of 15 TimeOuts during a 2 minute period.
During that period,
A typical log entry (production was INFO log level, so we do not have the DEBUG log level data) is:
All SELECT calls (not shown in the production INFO log do work), the first INSERT fails:
I, [2014-10-09T09:50:45.233755 #27243] INFO -- : Started PUT "/api/aanvragen/2052" for x.x.x.x at 2014-10-09 09:50:45 +0200
I, [2014-10-09T09:50:45.244516 #27243] INFO -- : Processing by Api::V1::AanvragenController#update as JSON
I, [2014-10-09T09:50:45.244811 #27243] INFO -- : Parameters: {"payload"=> ...
...
(many GET calls here, not shown in production log)
...
E, [2014-10-09T09:51:00.284821 #27243] ERROR -- : TinyTds::Error: Adaptive Server connection timed out: EXEC sp_executesql N'INSERT INTO [api_calls] ([centrum_id], [created_at], [entity], [method], [payload], [updated_at], [uri]) VALUES (@0, @1, @2, @3, @4, @5, @6); SELECT CAST(SCOPE_IDENTITY() AS bigint) AS Ident', N'@0 int, @1 datetime, @2 nvarchar(255), @3 nvarchar(255), @4 nvarchar(max), @5 datetime, @6 nvarchar(255)', @0 = 73, @1 = '2014-10-09T07:50:45.261',
I, [2014-10-09T09:51:00.298191 #27243] INFO -- : CONNECTION RETRY: ActiveRecord::ConnectionAdapters::SQLServerAdapter retry #0.
I, [2014-10-09T09:51:00.304308 #27243] INFO -- : Completed 500 Internal Server Error in 15059ms
F, [2014-10-09T09:51:00.328699 #27243] FATAL -- :
What I have on info is:
$ dpkg -l '*freetds*' | grep ^ii
ii freetds-bin 0.82-7 FreeTDS command-line utilities
ii freetds-common 0.82-7 configuration files for FreeTDS SQL client libraries
ii freetds-dev 0.82-7 MS SQL and Sybase client library (static libs and headers)
GIT
remote: https://github.com/veracross/tiny_tds.git
revision: 937c352d7a760c4b65e5cc581b299e9f63132da3
specs:
tiny_tds (0.6.2)
SQL Server 2012 (I have no further details at this time).
So, there is a first call that waits for 15 seconds for a TimeOut. After that, a CONNECTION RETRY is tried and then 50 ms later, the system dies.
We recently upgraded from an old Rails 3.2 set-up to Rails 4. But it is also only recent that we started to run serious production volumes and the API and watch the production log in detail for these issues.
Any hints would be greatly appreciated.
Are there specific signs that we should search for in the SQL Server logs to determine if this is the issue we are facing ?
Running FreeTDS 0.91 with latest tiny_tds on Azure SQL. Normal operation is perfect. However, when the app has been idle (~5-10 minutes), the database connection goes bad. The db service sends back => 20003 "Adaptive Server connection timed out". Then FreeTDS/Tiny TDS go in a loop with the server, responding with INT_CONTINUE. The server responds again with the 20003 error. They back and forth until the unicorn master is fed up and reaps the process.
I have no timeouts set in the freetds.conf file. It is empty except to activate the logging. I am running on Ubuntu 12.04, Rails 3.2.
Here is the FreeTDS log for a failed attempt:
1:17:27.063192 48918 (dblib.c:4880):dbdead(0x2e5ecb0) [alive] 21:17:27.072366 48918 (dblib.c:1312):dbcmd(0x2e5ecb0, EXEC sp_executesql N'SELECT TOP (1) [users].* FROM [users] WHERE [users].[id] = @0', N'@0 int', @0 = 1) 21:17:27.072434 48918 (dblib.c:1319):dbcmd() bufsz = 148 21:17:27.072464 48918 (dblib.c:5882):dbfreebuf(0x2e5ecb0) 21:17:27.072512 48918 (dblib.c:6862):dbsqlsend(0x2e5ecb0) 21:17:27.072540 48918 (mem.c:615):tds_free_all_results() 21:17:27.072567 48918 (util.c:156):Changed query state from IDLE to QUERYING 21:17:27.072597 48918 (write.c:140):tds_put_string converting 102 bytes of "EXEC sp_executesql N'SELECT TOP (1) [users].* FROM [users] WHERE [users].[id] = @0', N'@0 int', @0 = 1" 21:17:27.072882 48918 (write.c:168):tds_put_string wrote 204 bytes 21:17:27.072911 48918 (util.c:156):Changed query state from QUERYING to PENDING 21:17:27.072938 48918 (net.c:741):Sending packet 21:17:27.073066 48918 (net.c:1240):in tds_push_func 21:17:27.073208 48918 (dblib.c:5780):dbgetuserdata(0x2e5ecb0) 21:17:27.073251 48918 (dblib.c:5780):dbgetuserdata(0x2e5ecb0) 21:17:27.073278 48918 (dblib.c:5780):dbgetuserdata(0x2e5ecb0) 21:17:27.073304 48918 (dblib.c:5780):dbgetuserdata(0x2e5ecb0) 21:17:27.073331 48918 (dblib.c:4639):dbsqlok(0x2e5ecb0) 21:17:27.073357 48918 (dblib.c:4669):dbsqlok() not done, calling tds_process_tokens() 21:17:27.073382 48918 (token.c:540):tds_process_tokens(0x5794180, 0x7fff43cd8bc8, 0x7fff43cd8bcc, 0x6914) 21:17:27.073446 48918 (util.c:156):Changed query state from PENDING to READING 21:17:27.073476 48918 (net.c:1199):in tds_pull_func 21:17:37.129602 48918 (util.c:331):tdserror(0x46232c0, 0x5794180, 20003, 0) 21:17:37.129670 48918 (dblib.c:7929):dbperror(0x2e5ecb0, 20003, 0) 21:17:37.129689 48918 (dblib.c:7981):20003: "Adaptive Server connection timed out" 21:17:37.129697 48918 (dblib.c:5780):dbgetuserdata(0x2e5ecb0) 21:17:37.129718 48918 (dblib.c:4880):dbdead(0x2e5ecb0) [alive] 21:17:37.129726 48918 (dblib.c:3196):dbcancel(0x2e5ecb0) 21:17:37.129733 48918 (query.c:2155):tds_send_cancel: not in_cancel and not idle 21:17:37.129740 48918 (query.c:2164):tds_send_cancel: sending cancel packet 21:17:37.129747 48918 (net.c:741):Sending packet 21:17:37.129790 48918 (net.c:1240):in tds_push_func 21:17:37.129813 48918 (dblib.c:8002):"Adaptive Server connection timed out", client returns 1 (INT_CONTINUE) 21:17:37.129834 48918 (util.c:361):tdserror: client library returned TDS_INT_CONTINUE(1) 21:17:37.129842 48918 (util.c:384):tdserror: returning TDS_INT_CONTINUE(1) 21:17:47.076371 48918 (util.c:331):tdserror(0x46232c0, 0x5794180, 20003, 4) 21:17:47.076413 48918 (dblib.c:7929):dbperror(0x2e5ecb0, 20003, 4) 21:17:47.076427 48918 (dblib.c:7981):20003: "Adaptive Server connection timed out" 21:17:47.076437 48918 (dblib.c:5780):dbgetuserdata(0x2e5ecb0) 21:17:47.076446 48918 (dblib.c:8002):"Adaptive Server connection timed out", client returns 1 (INT_CONTINUE) 21:17:47.076455 48918 (util.c:361):tdserror: client library returned TDS_INT_CONTINUE(1) 21:17:47.076463 48918 (util.c:384):tdserror: returning TDS_INT_CONTINUE(1) 21:17:47.076489 48918 (dblib.c:5780):dbgetuserdata(0x2e5ecb0) 21:17:47.076501 48918 (dblib.c:3196):dbcancel(0x2e5ecb0) 21:17:47.076510 48918 (query.c:2155):tds_send_cancel: in_cancel and not idle 21:17:47.180191 48918 (dblib.c:5780):dbgetuserdata(0x2e5ecb0) 21:17:47.180249 48918 (dblib.c:3196):dbcancel(0x2e5ecb0) 21:17:47.180263 48918 (query.c:2155):tds_send_cancel: in_cancel and not idle 21:17:57.126580 48918 (util.c:331):tdserror(0x46232c0, 0x5794180, 20003, 4) 21:17:57.126689 48918 (dblib.c:7929):dbperror(0x2e5ecb0, 20003, 4) 21:17:57.126726 48918 (dblib.c:7981):20003: "Adaptive Server connection timed out" 21:17:57.126760 48918 (dblib.c:5780):dbgetuserdata(0x2e5ecb0) 21:17:57.126793 48918 (dblib.c:8002):"Adaptive Server connection timed out", client returns 1 (INT_CONTINUE) 21:17:57.126826 48918 (util.c:361):tdserror: client library returned TDS_INT_CONTINUE(1) 21:17:57.126985 48918 (util.c:384):tdserror: returning TDS_INT_CONTINUE(1)