phusion / passenger

A fast and robust web server and application server for Ruby, Python and Node.js
https://www.phusionpassenger.com/
MIT License
5k stars 547 forks source link

Rails + Mongoid running on Passenger reports #2316

Closed schuylr closed 4 years ago

schuylr commented 4 years ago

Issue report

When our MongoDB Atlas Cluster performs a fail-over (primary node restarts and elects a secondary node), the running web application fails to reload the Mongo client connections correctly, and reports these exceptions on the application level:

A Mongo::Error::NoServerAvailable occurred in carts#show:

  No primary server is available in cluster: #<Cluster topology=ReplicaSetNoPrimary[redacted] servers=[#<Server address=redacted:27017 SECONDARY replica_set=atlas-redacted-shard-0 pool=#<ConnectionPool size=0 (0-5) used=0 avail=0 pending=0>>,#<Server address=redacted:27017 UNKNOWN pool=#<ConnectionPool size=0 (0-5) used=0 avail=0 pending=0>>,#<Server address=redacted:27017 SECONDARY replica_set=atlas-redacted-shard-0 pool=#<ConnectionPool size=0 (0-5) used=0 avail=0 pending=0>>]> with timeout=5, LT=0.015. The following servers have dead monitor threads: #<Server address=redacted:27017 SECONDARY replica_set=atlas-redacted-shard-0 pool=#<ConnectionPool size=0 (0-5) used=0 avail=0 pending=0>>, #<Server address=redacted:27017 UNKNOWN pool=#<ConnectionPool size=0 (0-5) used=0 avail=0 pending=0>>, #<Server address=redacted:27017 SECONDARY replica_set=atlas-redacted-shard-0 pool=#<ConnectionPool size=0 (0-5) used=0 avail=0 pending=0>>. The cluster is disconnected (client may have been closed)
  app/models/cart/methods.rb:107:in `create2'

Are you sure this is a bug in Passenger?

We've been working with MongoDB Support for over a month triaging the Mongo Ruby driver, Mongoid, and other various possibilities related to the Ruby gems. What we know so far (each test case was performed with a manual fail-over, which is reproducible with Rails running in Passenger):

  1. This issue does not occur with the native Mongo Ruby driver in a reproduction script
  2. This issue does not occur with our full Rails environment loaded in the foreground through bin/rails c and Mongoid clients loaded
  3. This issue seems to only occur when Rails is running on Passenger
  4. There is no issue with the Mongo cluster, as it is managed by MongoDB themselves through Atlas
  5. I noticed this forum post where someone else experienced the same sort of issue specifically with Passenger + Rails

Please try with the newest version of Passenger to avoid issues that have already been fixed

I checked the changelogs since 6.0.4 and don't see anything that would indicate a fix for Mongo connections.

Question 1: What is the problem?

When Mongo Atlas clusters perform a fail-over, Mongoid should be able to reconnect to the cluster automatically

The reconnection does not occur, and can only be fixed by restarting the application running on Passenger.

I'll work on creating a Docker image + Rails that should reproduce it, in case these reproduction steps don't work for you:

  1. Run Rails 5.2.1 / Ruby 2.6.6 on Phusion Passenger (the integration mode does not seem to matter) with Mongoid 7.0.8
  2. Set up a Mongo Atlas cluster (free tier will probably work fine)
  3. Set up a controller/endpoint that creates a Mongoid document in the Atlas Cluster
  4. Hit the endpoint until client connections are established on all Passenger worker threads
  5. Perform a failover on the Atlas cluster
  6. Hit the endpoint again - observe that the exception now occurs

Question 2: Passenger version and integration mode:

Question 3: OS or Linux distro, platform (including version):

Question 4: Passenger installation method:

Your answer: [ ] RubyGems + Gemfile [ ] RubyGems, no Gemfile [ ] Phusion APT repo [ ] Phusion YUM repo [ ] OS X Homebrew [ ] source tarball [x] Other, please specify: Rubygems + RVM + passenger-install-nginx-module

Question 5: Your app's programming language (including any version managers) and framework (including versions):

Question 6: Are you using a PaaS and/or containerization? If so which one?

Question 7: Anything else about your setup that we should know?

We have followed the Mongoid documentation where we should be performing client reconnections on smart spawning. Here is our config.ru file:

# This file is used by Rack-based servers to start the application.

require ::File.expand_path('../config/environment', __FILE__)

if defined?(PhusionPassenger)
  PhusionPassenger.on_event(:starting_worker_process) do |forked|
    if forked
      # We're in smart spawning mode
      # Re-cycle the Mongo client connections
      Mongoid::Clients.clients.each do |_name, client|
        client.close
        client.reconnect
      end
    end
  end
end

run Rails.application

Happy to provide anything else that may be relevant.

schuylr commented 4 years ago

I spun up a reproduction environment and tried to reproduce this issue at-will, but can't find anything that exposes the problem. I'll close this for now until I'm more certain there's an issue with passenger.

jforkan commented 2 years ago

I spun up a reproduction environment and tried to reproduce this issue at-will, but can't find anything that exposes the problem. I'll close this for now until I'm more certain there's an issue with passenger.

Were you ever able to resolve this issue? I've encountered the exact same situation.

schuylr commented 2 years ago

Hi @jforkan - I'm no longer with the business with this environment, and did not have a clear smoking gun to solving this issue. I would suggest opening a new ticket.

jforkan commented 2 years ago

Hi @jforkan - I'm no longer with the business with this environment, and did not have a clear smoking gun to solving this issue. I would suggest opening a new ticket.

FYI Looks like this has been resolved in mongo driver 2.16.1 and testing confirms it is no longer an issue in my environment. See https://jira.mongodb.org/browse/RUBY-2806