rails-sqlserver / activerecord-sqlserver-adapter

SQL Server Adapter For Rails
MIT License
972 stars 558 forks source link

TinyTds::Error: DBPROCESS is dead or not enabled #402

Closed thomasfedb closed 9 years ago

thomasfedb commented 9 years ago

I'm receiving this error from a long-running worker process that periodically accesses a SQL Server database. Curious that some ActiveRecord machinery isn't attempting a reconnection, if that's the issue.

Full trace:

vendor/bundle/ruby/2.2.0/gems/activerecord-sqlserver-adapter-4.2.3/lib/active_record/connection_adapters/sqlserver/database_statements.rb:308:in `execute': TinyTds::Error: DBPROCESS is dead or not enabled: EXEC sp_executesql N'SELECT [uvStudyLabTutors_CCGS].* FROM [uvStudyLabTutors_CCGS]' (ActiveRecord::StatementInvalid)
    from vendor/bundle/ruby/2.2.0/gems/activerecord-sqlserver-adapter-4.2.3/lib/active_record/connection_adapters/sqlserver/database_statements.rb:308:in `raw_connection_run'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-sqlserver-adapter-4.2.3/lib/active_record/connection_adapters/sqlserver/database_statements.rb:299:in `_raw_select'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-sqlserver-adapter-4.2.3/lib/active_record/connection_adapters/sqlserver/database_statements.rb:295:in `block in raw_select'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.0/lib/active_record/connection_adapters/abstract_adapter.rb:466:in `block in log'
    from vendor/bundle/ruby/2.2.0/gems/activesupport-4.2.0/lib/active_support/notifications/instrumenter.rb:20:in `instrument'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.0/lib/active_record/connection_adapters/abstract_adapter.rb:460:in `log'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-sqlserver-adapter-4.2.3/lib/active_record/connection_adapters/sqlserver/database_statements.rb:295:in `raw_select'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-sqlserver-adapter-4.2.3/lib/active_record/connection_adapters/sqlserver/database_statements.rb:244:in `sp_executesql'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-sqlserver-adapter-4.2.3/lib/active_record/connection_adapters/sqlserver/database_statements.rb:19:in `exec_query'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-sqlserver-adapter-4.2.3/lib/active_record/connection_adapters/sqlserver/database_statements.rb:199:in `select'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.0/lib/active_record/connection_adapters/abstract/database_statements.rb:32:in `select_all'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.0/lib/active_record/connection_adapters/abstract/query_cache.rb:70:in `select_all'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.0/lib/active_record/querying.rb:39:in `find_by_sql'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.0/lib/active_record/relation.rb:638:in `exec_queries'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.0/lib/active_record/relation.rb:514:in `load'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.0/lib/active_record/relation.rb:243:in `to_a'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.0/lib/active_record/relation/delegation.rb:46:in `each'
    ...
`
metaskills commented 9 years ago

Hey @thomasfedb, that is interesting. The adapter should be doing some rudimentary connection management, see here. Have you tried Rails 4.2.1 or investigated this further?

krzcho commented 9 years ago

same here with rails 4.1.10 and 4.1.0 adapter i have reconnect: true option in db.yml but i am not sure if it is being used

krzcho commented 9 years ago

it looks like it is not recommended to use this option (http://stackoverflow.com/questions/21594429/database-yml-in-rails-reconnect-set-to-true-or-false) in case of a connection being lost during a transaction mine sql has a standard connection timeout of 10 mins so it probably would be simplest to do an each 5 mins dummy query to keep it opened

metaskills commented 9 years ago

There is no more reconnect option in 4.2 and upward. We simply conform to ActiveRecord's connection pool management protocol and it should do the rest. Also, I just did a pre-release of TinyTDS 0.6.3.rc1 too. Maybe try that as well. Note, this pre-release is POSIX only. No windows binaries.

thomasfedb commented 9 years ago

Hey @metaskills.

I last saw this error when using Rails 4.2.0. I've now upgraded to 4.2.1, and I'm waiting to see if the same error occurs. Is there anything in particular in 4.2.1 that should fix this error?

If I see the error again, I will try with the TinyTDS pre-release.

metaskills commented 9 years ago

Nothing specific. I did some general 4.2.1 cleanup around connection management while I was working on transaction support changes.

thomasfedb commented 9 years ago

Hey @metaskills. I have updated to 4.2.1, and so far no further issues. I'm going to give it a few more days, as it was previously erroring out only after a few days.

metaskills commented 9 years ago

Great, thanks for the update. Will keep this open. When you report back, let me know if you updated to the 0.6.3.rc1 version of TinyTDS too.

thomasfedb commented 9 years ago

I haven't upgraded TinyTDS to 0.6.3.rc1 yet, as my issues don't seem to stemming from that, though I'll try it if Rails 4.2.1 doesn't fix the issue.

thomasfedb commented 9 years ago

The issues has recurred, here's the trace with Rails 4.2.1.

vendor/bundle/ruby/2.2.0/gems/activerecord-sqlserver-adapter-4.2.4/lib/active_record/connection_adapters/sqlserver/database_statements.rb:308:in `execute': TinyTds::Error: DBPROCESS is dead or not enabled: EXEC sp_executesql N'SELECT [uvStudyLabTutors_CCGS].* FROM [uvStudyLabTutors_CCGS]' (ActiveRecord::StatementInvalid)
    from vendor/bundle/ruby/2.2.0/gems/activerecord-sqlserver-adapter-4.2.4/lib/active_record/connection_adapters/sqlserver/database_statements.rb:308:in `raw_connection_run'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-sqlserver-adapter-4.2.4/lib/active_record/connection_adapters/sqlserver/database_statements.rb:299:in `_raw_select'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-sqlserver-adapter-4.2.4/lib/active_record/connection_adapters/sqlserver/database_statements.rb:295:in `block in raw_select'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.1/lib/active_record/connection_adapters/abstract_adapter.rb:473:in `block in log'
    from vendor/bundle/ruby/2.2.0/gems/activesupport-4.2.1/lib/active_support/notifications/instrumenter.rb:20:in `instrument'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.1/lib/active_record/connection_adapters/abstract_adapter.rb:467:in `log'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-sqlserver-adapter-4.2.4/lib/active_record/connection_adapters/sqlserver/database_statements.rb:295:in `raw_select'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-sqlserver-adapter-4.2.4/lib/active_record/connection_adapters/sqlserver/database_statements.rb:244:in `sp_executesql'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-sqlserver-adapter-4.2.4/lib/active_record/connection_adapters/sqlserver/database_statements.rb:19:in `exec_query'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-sqlserver-adapter-4.2.4/lib/active_record/connection_adapters/sqlserver/database_statements.rb:199:in `select'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.1/lib/active_record/connection_adapters/abstract/database_statements.rb:32:in `select_all'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.1/lib/active_record/connection_adapters/abstract/query_cache.rb:70:in `select_all'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.1/lib/active_record/querying.rb:39:in `find_by_sql'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.1/lib/active_record/relation.rb:638:in `exec_queries'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.1/lib/active_record/relation.rb:514:in `load'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.1/lib/active_record/relation.rb:243:in `to_a'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.1/lib/active_record/relation/delegation.rb:46:in `each'
    ...

I'll try upgrading TidyTDS to 0.6.3.rc1.

thomasfedb commented 9 years ago

As it happens, silly me for not checking earlier, I'm already on master for TinyTDS:

gem "tiny_tds", git: "https://github.com/rails-sqlserver/tiny_tds.git"

Given this, what's the next avenue of enquiry?

metaskills commented 9 years ago

As it happens, silly me for not checking earlier, I'm already on master for TinyTDS

Well, just because you are bundled to master does not mean you are on the 0.6.3 version. Do you know what SHA is in your Gemfile.lock?

Given this, what's the next avenue of enquiry?

I just always like to check. Nothing in the change log strikes me as affecting you.

thomasfedb commented 9 years ago

I'm on 0.6.3, specifically ed4e43ec2598fc7879359f53b6d941b9e92234bc.

Looking at the comparison ed4e43e...master I don't see anything that should affect this.

thomasfedb commented 9 years ago

Hey @metaskills, this continues to be an issue.

Here's the latest trace from an occurrence earlier this evening:

vendor/bundle/ruby/2.2.0/gems/activerecord-sqlserver-adapter-4.2.4/lib/active_record/connection_adapters/sqlserver/database_statements.rb:308:in `execute': TinyTds::Error: DBPROCESS is dead or not enabled: EXEC sp_executesql N'SELECT [uvStudyLabHouses_CCGS].* FROM [uvStudyLabHouses_CCGS]' (ActiveRecord::StatementInvalid)
    from vendor/bundle/ruby/2.2.0/gems/activerecord-sqlserver-adapter-4.2.4/lib/active_record/connection_adapters/sqlserver/database_statements.rb:308:in `raw_connection_run'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-sqlserver-adapter-4.2.4/lib/active_record/connection_adapters/sqlserver/database_statements.rb:299:in `_raw_select'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-sqlserver-adapter-4.2.4/lib/active_record/connection_adapters/sqlserver/database_statements.rb:295:in `block in raw_select'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.1/lib/active_record/connection_adapters/abstract_adapter.rb:473:in `block in log'
    from vendor/bundle/ruby/2.2.0/gems/activesupport-4.2.1/lib/active_support/notifications/instrumenter.rb:20:in `instrument'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.1/lib/active_record/connection_adapters/abstract_adapter.rb:467:in `log'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-sqlserver-adapter-4.2.4/lib/active_record/connection_adapters/sqlserver/database_statements.rb:295:in `raw_select'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-sqlserver-adapter-4.2.4/lib/active_record/connection_adapters/sqlserver/database_statements.rb:244:in `sp_executesql'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-sqlserver-adapter-4.2.4/lib/active_record/connection_adapters/sqlserver/database_statements.rb:19:in `exec_query'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-sqlserver-adapter-4.2.4/lib/active_record/connection_adapters/sqlserver/database_statements.rb:199:in `select'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.1/lib/active_record/connection_adapters/abstract/database_statements.rb:32:in `select_all'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.1/lib/active_record/connection_adapters/abstract/query_cache.rb:70:in `select_all'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.1/lib/active_record/querying.rb:39:in `find_by_sql'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.1/lib/active_record/relation.rb:638:in `exec_queries'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.1/lib/active_record/relation.rb:514:in `load'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.1/lib/active_record/relation.rb:243:in `to_a'
    from vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.1/lib/active_record/relation/delegation.rb:46:in `each'

Relevant bundler output:

  * activerecord-sqlserver-adapter (4.2.4)
  * tiny_tds (0.6.3 ed4e43e)
metaskills commented 9 years ago

Hmmm... do you know if the error happens during the long task or directly after the period of time it sat idle? If the later, I wonder if there is a server timeout that is happening? Similar to this issue? https://github.com/rails-sqlserver/tiny_tds/pull/162

metaskills commented 9 years ago

@thomasfedb As a side note, this is why I asked about the TinyTDS version. This change was in 0.6.3 too and not noted in the change log. Please read: https://github.com/rails-sqlserver/tiny_tds/pull/179

thomasfedb commented 9 years ago

@metaskills The issue appears to occur after a long period of inactivity, rather than during a long-running task. Once the error has occurred the application doesn't seem to attempt to reconnect, I have to manually restart.

I've talked to the DBA in charge of the SQL Server in question, who says that it is not configured to automatically close connections, and that backup procedures shouldn't be interrupting connections.

I've been having a closer look into the particulars of the situation, and I note that this only appears to occur in our job queue workers, which uses Que. I'm wondering if this could be implicated and will need to look closer.

krzcho commented 9 years ago

I suggest catching the session with wireshark to know better: which end drops the connection, when and (possibly) why; it should be then easier to find a fix for it

metaskills commented 9 years ago

The issue appears to occur after a long period of inactivity, rather than during a long-running task.

Interesting. I am going to do some tests too and leave a connection open and idle for a long time. Maybe you can do the same. If this is the case, then we should really dig into this (https://github.com/rails-sqlserver/tiny_tds/pull/179) issue and see if there is a sybase message that helps us know what type of timeout is happening.

thomasfedb commented 9 years ago

I've had a look into how Que interacts with ActiveRecord, and I'm fairly convinced that it shouldn't be adding to the issues seen here. Looks like ActiveRecord has fairly sensible multi-connection handling.

If there are any particular diagnostics I should be running on the dead connections, let me know @metaskills.

metaskills commented 9 years ago

OK, I did a big round up on this. Tons of info below!

Test TinyTDS Idle For 24 Hours

No issue whatsoever. Connection worked great!

Does SQL Server Reap Idle Connections?

Not that I could find. I know Azure does this and that is why /tiny_tds/162 is open. Here are some articles I read on how others might choose to do this in SQL Server. Something to check for.

Does ActiveRecord Reap Dead Connections?

Not anymore! Tho it still has this reaper_frequency for database.yml which is pretty much deprecated. In the Rails core issue link further below, it talks about this old setting ab it. This article below and embedded links warn against it too. Knowing this, I am now going to ask people if they have this accidentally left in and recommend they remove it. Have you?

How Does ActiveRecord PostgreSQL Adapter Handle This?

First, I did a QA of ActiveRecord's connection pool's checkout and verify code. The connection pool only verifies a newly checkout connection which only happens when it assigns one to the current thread. So assuming a simple IRB session like so, this happens in my personal Rails 4.2 application using PostgreSQL.

> ActiveRecord::Base.connection.execute 'SELECT 1'
=> #<PG::Result ... >

# Kill Or Restart PostgreSQL

> ActiveRecord::Base.connection.active?
=> true
> ActiveRecord::Base.connection.execute 'SELECT 1'
PG::UnableToSend: : SELECT 1

It is worth pointing out that if I restart the database between reloading web pages that it recovers just fine! The reason being is that ActiveRecord has a ConnectionManagement middleware which calls clear_active_connections! which releases & checks in each connection. This sets up the next request to check out a connection which is also an implicit verify as well. So to be clear, there are two different behaviors here and I think we both assumed that connections in things like workers are checked out (and verified) every time for the current thread. This is not the case! It is only true for requests/responses.

I suggest reading the following Rails core issues in full. The first is long and the solutions are the following two merged issues.

Why Not Make Our Own Reconnect Strategy

I dropped this for a reason. It is hard to do! Reading thru those core issues proves my point on that. I want to solve this in way that allows Rails to do the heavy lifting.

So Where Does This Leave Us?

I am not sure. It seems to me we are doing exactly like PG does. We use raw_connection_do 'SELECT 1' in our active check and follow all the best practices. That said, here is one thought:

1) Use ActiveRecord::Base.clear_active_connections! - Perhaps if you have a periodic job that uses the same thread / connection, you should clear active connections before/after your work to simulate what Rails does around each controller request for a long lived app.

thomasfedb commented 9 years ago

Wow, awesomely comprehensive look at this issue.

Given that this is a fairly new application, and I've never set any reaper configuration, I was fairly sure this wasn't the issue. Just to be sure I searched my codebase for reaper_frequency and reaper, both turning up no results.

I had also noticed that AR only seems to verify connections when they are assigned to theads, interesting to hear about the middlewear to release them between requests.

I agree that it seems that ActiveRecord::Base.clear_active_connections! looks to be a promising solution in job queue workers. I'll try it out on my worker, and then see if I can suggest an update to the worker gem.

I'll keep you posted @metaskills.

theangryangel commented 9 years ago

@thomasfedb out of interest, what platform is your SQL database running on? Is it on a cluster where VM mobility may be coming into play?

In certain circumstances I've been able to perform a live migration of a SQL server on a Hyper-V cluster, and if I catch it at the right time I'm able to get the same symptoms as this issue. Not entirely unreasonable, all things considered.

thomasfedb commented 9 years ago

@theangryangel I believe that it is hosted on a VM, though I don't think it has been migrated recently. Certainly not multiple times a week, nor at 11pm.

theangryangel commented 9 years ago

@thomasfedb Last question and I'll butt out - would that be around the time a backup is taken? i.e. if it's a Hyper-V box and it's not using VSS it could be putting the VM to sleep for a very short amount of time.

thomasfedb commented 9 years ago

@theangryangel I'm told the backups are run at 3am, so I don't think this is the issue either.

thomasfedb commented 9 years ago

Hi @metaskills. I've been running now for a month with ActiveRecord::Base.clear_active_connections! and have had no issues since. Thanks for your help!

metaskills commented 9 years ago

OMG that makes me so happy. Thanks a ton for following up!!!

OmriSama commented 3 years ago

@metaskills and @wpolicarpo hey guys :)

I ran into this recently when we had an unexpected restart of our SQL Server. What I'm assuming is that if the SQL Server is restarting/down, and you try to query it, the query fails and FreeTDS/dblib sets the connection as dead. At that point, it doesn't even try to reconnect or re-establish -- it's simply completely dead.

Now, what I thought would happen is that the adapter/ActiveRecord::ConnectionPool would rescue from this and try to do some combination of reaping the connection and checking out a new one, particularly when trying to select a generic connection from the pool using the #connection method. This does not happen unfortunately, and I was wondering why; is the assumption that the DB is dead and we don't want to keep forcing new connections to a dead DB?

When the SQL Server was up, calling clear_reloadable_connections! fixed both connections. In my libraries/applications, I can rescue manually and call this, although I would want to make sure that this doesn't interfere with the query cache and any currently running queries.

As an aside: I'm wondering what would happen in a scenario where a connection is checked out, SQL Server is restarted and no queries are made (so the client still believes the connection is active?), and then a query is run once the restart had been completed; would it still work then?

Thanks in advance guys!

pd2871 commented 1 year ago

The first solution can be to use pool_pre_ping=True if you are creating the engine using sqlalchemy like this:

engine = create_engine(sql_url, pool_pre_ping=True)

Second solution can be to use local variables instead of global variables:

This error was happening when I was using SQL Server with python, the reason was that I created a global connection like this:

conn = sql_engine.connect()  #wrong way to create connection globally

def db_process():
    res = conn.execute("select * from table1")

This was the main reason for the error. Instead of creating a global variable for connection, I needed to create a connection inside each function like this:

def db_process():
    with sql_engine.connect() as conn:
        res = conn.execute("select * from table1")

So, try avoiding to create a global connection which will be dead after not using db for a long time.