thiagopradi / octopus

Database Sharding for ActiveRecord
2.53k stars 505 forks source link

PG::ConnectionBad: PQsocket() can't get socket descriptor: SELECT ... #457

Open TimBest opened 6 years ago

TimBest commented 6 years ago

This error started appearing when upgrading to Rails 5.1 and octopus 0.9.1. Where a small number of queries would fail raising PG::ConnectionBad: PQsocket() can't get socket descriptor: SELECT ...

It looks like Rails is verifying that connections are valid when checking them out but after going through octopus the connection has closed and PQsocket() returns -1.

StackTrace

rails/activerecord/lib/active_record/connection_adapters/postgresql/database_statements.rb" line 73 in async_exec
rails/activerecord/lib/active_record/connection_adapters/postgresql/database_statements.rb" line 73 in block (2 levels) in execute
rails/activesupport/lib/active_support/dependencies/interlock.rb" line 46 in block in permit_concurrent_loads
rails/activesupport/lib/active_support/concurrency/share_lock.rb" line 185 in yield_shares
rails/activesupport/lib/active_support/dependencies/interlock.rb" line 45 in permit_concurrent_loads
rails/activerecord/lib/active_record/connection_adapters/postgresql/database_statements.rb" line 72 in block in execute
rails/activerecord/lib/active_record/connection_adapters/abstract_adapter.rb" line 612 in block (2 levels) in log
.rbenv/versions/2.4.1/lib/ruby/2.4.0/monitor.rb" line 214 in mon_synchronize
rails/activerecord/lib/active_record/connection_adapters/abstract_adapter.rb" line 611 in block in log
rails/activesupport/lib/active_support/notifications/instrumenter.rb" line 21 in instrument
octopus/lib/octopus/abstract_adapter.rb" line 13 in instrument
rails/activerecord/lib/active_record/connection_adapters/abstract_adapter.rb" line 603 in log
rails/activerecord/lib/active_record/connection_adapters/postgresql/database_statements.rb" line 71 in execute
rails/activerecord/lib/active_record/connection_adapters/postgresql/database_statements.rb" line 131 in begin_db_transaction
rails/activerecord/lib/active_record/connection_adapters/abstract/transaction.rb" line 130 in initialize
rails/activerecord/lib/active_record/connection_adapters/abstract/transaction.rb" line 156 in new
rails/activerecord/lib/active_record/connection_adapters/abstract/transaction.rb" line 156 in block in begin_transaction
.rbenv/versions/2.4.1/lib/ruby/2.4.0/monitor.rb" line 214 in mon_synchronize
rails/activerecord/lib/active_record/connection_adapters/abstract/transaction.rb" line 152 in begin_transaction
rails/activerecord/lib/active_record/connection_adapters/abstract/transaction.rb" line 193 in block in within_new_transaction
.rbenv/versions/2.4.1/lib/ruby/2.4.0/monitor.rb" line 214 in mon_synchronize
rails/activerecord/lib/active_record/connection_adapters/abstract/transaction.rb" line 191 in within_new_transaction
rails/activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb" line 235 in transaction
octopus/lib/octopus/proxy.rb" line 122 in transaction
rails/activerecord/lib/active_record/transactions.rb" line 210 in transaction
rails/activerecord/lib/active_record/transactions.rb" line 381 in with_transaction_returning_status
rails/activerecord/lib/active_record/transactions.rb" line 308 in block in save
rails/activerecord/lib/active_record/transactions.rb" line 323 in rollback_active_record_state!
rails/activerecord/lib/active_record/transactions.rb" line 307 in save
rails/activerecord/lib/active_record/suppressor.rb" line 42 in save
(Controller)

System configuration

Rails version: 5.1.3 Ruby version: 2.4.1 octopus: 0.9.1

varentsov commented 6 years ago

I'm also have this

pboling commented 6 years ago

I am have started seeing this as well on Rails 4.2.10 and Ruby 2.3.4.

theirishpenguin commented 6 years ago

I'm getting this error too and am using the octopus gem. Can't be sure it is at fault but I've ruled out a few other potential culprits.

meysammeisam commented 6 years ago

We are using octopus(0.9.1) on rails(5.1) and postgresql as database. We have 2 databases in master/slave replication mode.(master: read/write. slave: read-only). When we start web server or console, everything is working and is okay, but when slave database restarts, octopus fails. seems octopus couldn't handle broken connections!

# start rails console
Octopus.using(:slave1){ ActiveRecord::Base.connection.query('select 1;') }
#[Shard: slave1]   (0.3ms)  select 1;
#=> [[1]]

############### RESTART slave1 postgresql server ###############
Octopus.using(:slave1){ ActiveRecord::Base.connection.query('select 1;') }
#[Shard: slave1]   (0.3ms)  select 1;
#ActiveRecord::StatementInvalid: PG::ConnectionBad: PQconsumeInput() server closed the connection #unexpectedly
#        This probably means the server terminated abnormally
#        before or while processing the request.
#: select 1;

Octopus.using(:slave1){ ActiveRecord::Base.connection.query('select 1;') }
# [Shard: slave1]   (0.3ms)  select 1;
# ActiveRecord::StatementInvalid: PG::ConnectionBad: PQsocket() can't get socket descriptor: select 1;

Same happens on rails server and fixes just when we restart web servers. I tested same scenario for master shard(which rails handles it) and everything is okay there. I mean, Rails finds out that the connection has been terminated and creates new connection. Also I can reproduce it on both production(unicron) and development(thin). shards.yml:

octopus:
  replicated: true
  fully_replicated: false
  environments:
    - development
    - production
  default_slave1: &default_slave1
    adapter: "postgis"
    prepared_statements: false
    reconnect: true
    encoding: unicode
    pool: "5"
    username: "username"
    password: "password"
    host: "localhost"
    port: "5433"
    database: "db_name"
  development:
    slave1:
      <<: *default_slave1
  production:
    slave1:
      <<: *default_slave1

database.yml:

default: &default
  adapter: "postgis"
  encoding: unicode
  prepared_statements: false
  pool: "5"
  username: "username"
  password: "password"
  host: "localhost"
  port: "5432"
  database: "db_name"

development:
  <<: *default
test:
  <<: *default
production:
  <<: *default
meysammeisam commented 6 years ago

connection.verify! fixes it:

Octopus.using(:slave1){ ActiveRecord::Base.connection.query('select 1;') }
#ActiveRecord::StatementInvalid: PG::ConnectionBad: PQsocket() can't get socket descriptor: select 1;

Octopus.using(:slave1){ ActiveRecord::Base.connection.verify! }
# => []

Octopus.using(:slave1){ ActiveRecord::Base.connection.query('select 1;') }
#[Shard: slave1]   (0.4ms)  select 1;
#=> [[1]]
meysammeisam commented 6 years ago

This comment would be the answer.

gkkirsch commented 6 years ago

This is the comment I think @meysammeisam is talking aboutMy guess would be that the gem whose job is to swap connections out from under AR, is swapping connections out from under AR -- presumably after we verify, and without doing any verification for itself.

Basically Octopus isn't verifying the connections are good before using them...?

@thiagopradi Does this sound right? Where would a check like this go?

gkkirsch commented 6 years ago

We could rescue the ActiveRecord::StatementInvalid error in Proxy.rb and then use connection.verify! and rerun the desired query. I am guessing there is a better way though since I don't know this repo very well.

    def select_all(*args, &block)
      legacy_method_missing_logic('select_all', *args, &block)
    rescue ActiveRecord::StatementInvalid => e
      select_connection.verify!
      legacy_method_missing_logic('select_all', *args, &block)
    end
kzvonov commented 6 years ago

I had a very similar problem but with rspec.

Failure/Error: DatabaseCleaner.clean_with(:truncation)
  ActiveRecord::StatementInvalid:  
    PG::ConnectionBad: PQsocket() can't get socket descriptor

Solved it by placing this ActiveRecord::Base.clear_active_connections! in spec/rails_helper.rb

brianbroderick commented 6 years ago

FYI, we've been running my fork for a week or so now at my work and haven't seen this error since then. I haven't submitted a PR because I basically borrowed code from other people's PRs. But for reference, these 2 commits appear to have solved it:

https://github.com/brianbroderick/octopus/commit/835555bd2a6f3a11da145a51097f993a2643499d

https://github.com/brianbroderick/octopus/commit/d5faba5fe5c2e6a1255aba4596876fbe86d20f77

tibbon commented 5 years ago

@brianbroderick How's it going with those now? Would you mind making a PR with those if it's been stable so far and reconnecting properly

mnj93 commented 5 years ago

We're experiencing the same issue with our production setup, any one found any solution for it or should we just use @brianbroderick's fork?

gvn182 commented 5 years ago

Guys, did this merge fixed the issue? i'm having the same problem: ActiveRecord::StatementInvalid: PG::ConnectionBad: PQsocket() can't get socket descriptor: SELECT "users".* FROM "users"

ar-octopus (0.10.2) rails 5.0.6

When I restart the app it starts to work again

ricardovj commented 4 years ago

Any updates with this PR? We seem to be having the exact same problem. When Heroku performs DB maintenance on our postgres database, and the new servers are ready, our Rails app fails to connect to the new server, throws PG::ConnectionBad:: PQSocket errors as others have described here. A quick restart of the Rails app fixes the issue right away.

gvn182 commented 4 years ago

@ricardovj I fixed that in my fork, i've been using it for more than a month with no problem, just add: gem "ar-octopus", :git => 'https://github.com/gvn182/octopus.git' to your Gemfile.

ricardovj commented 4 years ago

Thanks @gvn182 but I think at this point I just rather push to upgrade to Rails 6 and stop using Octopus! It's been a great library to have but things like this and lack of maintenance worry me!

pbrumm commented 4 years ago

@gvn182 @ricardovj @brianbroderick @mnj93 @tibbon if you get a chance, I think this pr may fix the issue.
https://github.com/thiagopradi/octopus/pull/544

we are just getting started testing, but I can no longer reproduce it locally with this change.

satyanash commented 4 years ago

Can confirm, the above patches seem to work for me too. Ensuring that verify! is called after a PG::ConnectionBad or a PSQLException with the message This connection has been closed., allows rails, octopus and activerecord-jdbcpostgresql adapter to correctly reconnect to the database.

tibbon commented 3 years ago

Any updates on this?