Average length of time for PostgreSQL queries significantly longer after switching tenants

bradeaton commented 4 years ago

We're attempting to implement apartment in our existing Rails 4.2 application. We've run into a performance issue that we can't seem to get around. The crux of the issue seems to be that database queries that we execute after doing an Apartment::Tenant.switch take significantly longer than if we are not switching tenants. It seems as if optimizations that we were getting from PostgreSQL without switching tenants are lost once we switch. We see this issue even when there aren't any separate private databases configured. In the steps to reproduce below config.tenant_names is set to an empty hash so that everything is still in the primary database. The difference in DB performance illustrated in the steps to reproduce are seen across all the queries we've looked at so far. The results are just slower response times, it brings the application to a crawl and then a halt with average page load times above 15 seconds or more.

Steps to reproduce

# Without switching to primary database each time
# This is just a class that stores benchmark times with a tag and reports the min, max, and averages
bm = BenchmarkReport.new(description: "Apartment Tests")
bm.start(stat: "Overall time")
# To simulate the number of accounts we have in production environment on local or staging, we loop 20 times.
(1..20).each do
  # Account is an excluded model
  Account.all.each do |account|
      bm.start(stat: "account.users.where...")
      # The users table is not excluded
      main_user = account.users.where('main = ?', true).take
      bm.stop(stat: "account.users.where...")
  end
end
bm.stop(stat: "Overall time")
puts bm.report.to_json

# With switching to primary database each time
# This is just a class that stores benchmark times with a tag and reports the min, max, and averages
bm = BenchmarkReport.new(description: "Apartment Tests")
bm.start(stat: "Overall time")
# To simulate the number of accounts we have in production environment on local or staging, we loop 20 times.
(1..20).each do
  # Account is an excluded model
  Account.all.each do |account|
    bm.start(stat: "switch")
    Apartment::Tenant.switch(account.db_name) do
      bm.start(stat: "account.users.where...")
      # The users table is not excluded
      main_user = account.users.where('main = ?', true).take
      bm.stop(stat: "account.users.where...")
    end
    bm.stop(stat: "switch")
  end
end
bm.stop(stat: "Overall time")
puts bm.report.to_json

Expected behavior

I would expect the average time to execute the account.users query with tenant switching and without to be at least close. I would expect some overhead in the switch.

Actual behavior

The query times where we are switching tenants in between are x times longer than when we are not switching tenants. Here are the benchmark results:

# Without switching tenants
{
  "description": "Apartment Tests",
  "stats": [
    {
      "stat": "account.users.where...",
      "count": 1300,
      "min": 0.0018922750023193657,
      "max": 0.13093664599000476,
      "avg": 0.00294993647022728
    },
    {
      "stat": "Overall time",
      "count": 1,
      "min": 4.397003222984495,
      "max": 4.397003222984495,
      "avg": 4.397003222984495
    }
  ]
}

# With switching tenants
{
  "description": "Apartment Tests",
  "stats": [
    {
      "stat": "account.users.where...",
      "count": 1300,
      "min": 0.033399779989849776,
      "max": 2.0633101440034807,
      "avg": 0.055884263064766015
    },
    {
      "stat": "switch",
      "count": 1300,
      "min": 0.06311358700622804,
      "max": 2.130101154005388,
      "avg": 0.10326035546714243
    },
    {
      "stat": "Overall time",
      "count": 1,
      "min": 134.57561817902024,
      "max": 134.57561817902024,
      "avg": 134.57561817902024
    }
  ]
}

System configuration

Heroku

Database: (Tell us what database and its version you use.)

PostgreSQL 10.11

Apartment version:

2.2.1

Apartment config (in config/initializers/apartment.rb or so):

#
# Apartment Configuration
#
Apartment.configure do |config|

config.excluded_models = %w{
AccountActivity
AccountConfiguration
AccountFeature
Account
ApartmentMigration
ClickTrackingClientUrl
ClickTrackingUtm
Delayed::Backend::ActiveRecord::Job
Document
Feature
InboundMailLog
MobileCarrier
Notification
OmniLog
Permission
PhoneNumber
QueuedWorkflowEmail
ReportDatum
SetupAccount
SetupLocation
SetupUser
UserPasswordReset
WorkflowNode
WorkflowProcessNode
WorkflowProcessRunnerInstance
WorkflowProcess
WorkflowVersion
Workflow
ZipCode
}

# We handle providing the list of tenant names by retrieving Account records from the DB where we have defined a separate database. This allows us
# to have our multi-tenant clients in the primary database and our single-tenant('ish') clients in their own separate database
tenant_names = {}

Account.where('db_name IS NOT NULL').each do |account|
tenant = {
    adapter: account.db_adapter,
    port: account.db_port,
    host: account.db_host,
    username: account.db_username,
    password: account.db_password,
    database: ENV['PRIMARY_DB_NAME']
}

tenant_names[account.db_name] = tenant if account.db_name.present?
end

config.tenant_names = tenant_names
config.use_schemas = false
end

use_schemas: (true or false) false

Rails (or ActiveRecord) version: 4.2
Ruby version: 2.4.5

lcjury commented 4 years ago

Each database system implements multi-tenancy on his own way. Mysql has the use statement to change between databases, postgres has schemas, etc.

But, as you have use_schemas set to false, you're not using the schemas mecanism, instead, you're changing your connection on runtime each time you call to switch!

I don't know the inner workings of active record/apartment, but Apartment calls establish_connection who creates a new pool. Creating new connections is a really expensive operation, that's why we create connection pools and re-use them when possible, but, each you switch tenants, apartment is creating a new pool. This is a really expensive operation.

From the Readme: One can optionally use the full database creation instead if they want, though this is not recommended

I don't think this is an issue, for me it would me an expected outcome when using full database instead of schemas.

kitsunde commented 4 years ago

I don't know if it's the entire cause but when you change the search path, postgresql will replan prepared statements.

See notes: https://www.postgresql.org/docs/current/sql-prepare.html

I have plans of basically adding .where('? = ?', search_path, search_path) . to force some heavier prepared statements to not get replanned.

influitive / apartment