influitive / apartment

Database multi-tenancy for Rack (and Rails) applications
2.66k stars 462 forks source link

JDBC connection pooling #336

Open ari opened 8 years ago

ari commented 8 years ago

Apologies for using your task tracker to ask questions, but I couldn't find a more suitable mailing list. Let me know where to go if that's your preference.

This gem looks really interesting, but I'm unclear on how it handles connection pooling with JDBC. Let's say I had 500 tenants in a single application, would that result in 500 separate JDBC connection pools if I was running jruby with warbler/tomcat?

Obviously the database would not cope happily with so many connection pools (especially if each had several open connections).

Or do you manage a single pool, using a common user/pass that can access all databases/schemas, and then just switch the database before each SQL query? Isn't that schema switching adding a lot of SQL overhead, or is that reasonably minor in production?

bradrobertson commented 8 years ago

We recommended using a "schema'd" approach rather than a connection based approach to your tenants. Given either though, you'll only ever open up the # of connections you've specified as your max in the pool, we just use ActiveRecord for connecting so there's nothing surreptitious going on re: connecting.

If you're using something like postgres schemas, the call on each request is use schema <schemaname>. This is a very lightweight call basically causing 0 overhead on the resulting request.

ari commented 8 years ago

Unfortunately we are using mariadb, but from my reading in the last day I think that 'use database' is still quite fast.

I'm guessing then that because we have only one pool, the user/pass will need to be sufficient for all databases and we can't use per tenant access rights. That's not quite as nice as it could be, but it is no worse than our current setup where we have all tenants merged in one database with a tenant field per table.

In your experience, how has your approach scaled with hundred of tenants? Our data set is small (< 1Gb) but I'd like to be able to scale to lots of tenants (over 500). I'll look at how mariadb handles so many open tables, but is that the main place I need to think about scaling?

mikecmpbll commented 8 years ago

we use with mysql and >3000 tenants with the 'schema' approach (schema approach for mysql == use, as opposed to reconnecting on tenant switching) and have no problems.

richpeck commented 8 years ago

Mike, 3000 tenants? Are you storing these in a single MYSQL db or do you have to init different db's for each tenant? If the latter, how do you manage the schema for each DB?

mikecmpbll commented 8 years ago

@richpeck different databases on the same db server, same schema.

richpeck commented 8 years ago

How do you handle Rails migrations etc, or don't you change the schema at all?

mikecmpbll commented 8 years ago

rails migrations get applied sequentially to all databases upon deploy, Aparment handles this. i'll be investigating parallel migrations soon because of the obvious issues with time for with so many databases, there's a branch but i haven't had a chance to investigate it yet.

bradrobertson commented 8 years ago

Ya I'd say our biggest issue with lots of tenants is migration time. You also have to make sure if you have lots of web servers that none of them restart until all migrations are finished, which can be a bit of a coordination pain. We have roughly 1000 tenants on postgres. I can't really speak to Mysql. The one thing I would say though is that you shouldn't store high volume event-like data in individual schemas. We made that mistake and it's been a pain backing up our db due to size. Look for more distributed, scalable systems for anything beyond typical relational config-like stuff IMO.

mikecmpbll commented 8 years ago

we use capistrano which coordinates deploys across multiple production servers really elegantly. our databases are pretty huge (i think?) but we don't have a big issue with backups, it takes ~1 hour to dump and gzip all ~3000 databases. (~5.5gb)

ari commented 8 years ago

@mikecmpbll Thanks for your advice. My advice to you is to look at xtrabackup if you want a non-blocking backup mechanism that works better than mysqldump.

Do you not have problems with open_table_cache in mysql with 3000 databases times however many tables?

mikecmpbll commented 8 years ago

i'm a jack of all trades, master of some of it but definitely not db administration 😁.

i don't know a great deal about db optimisation but it's highly likely we've done very little of it except basic caching. i just took a quick look at the table_open_cache and we're using the default 400. the number of tables opened since sunday is some 19 million :D so that could probably benefit from some optimisation, but we don't have any specific performance issues.

one of the reasons i've got involved with some of the work on threading and sharding support in apartment is in case we reach issues with having this number of databases on the same server then we can split it across multiple servers.

(also thx for the recommendation!)

mikecmpbll commented 8 years ago

fwiw seeing as this is turning in to a bit of a performance discussion, i've ran this mysqltuner script and it outputs some interesting-ish stuff:

-------- Performance Metrics ----------------------------------------------------------------------- [--] Up for: 3d 10h 26m 51s (77M q [260.044 qps], 64K conn, TX: 329G, RX: 13G) [--] Reads / Writes: 96% / 4% [--] Binary logging is disabled [--] Physical Memory : 29.4G [--] Max MySQL memory : 22.9G [--] Other process memory: 188.2M [--] Total buffers: 22.5G global + 2.7M per thread (151 max threads) [--] P_S Max memory usage: 0B [--] Galera GCache Max memory usage: 0B [OK] Maximum reached memory usage: 22.8G (77.59% of installed RAM) [OK] Maximum possible memory usage: 22.9G (78.09% of installed RAM) [OK] Overall possible memory usage with other process is compatible with memory available [OK] Slow queries: 0% (2K/77M) [OK] Highest usage of available connections: 63% (96/151) [OK] Aborted connections: 0.00% (0/64011) [!!] Query cache may be disabled by default due to mutex contention. [OK] Sorts requiring temporary tables: 0% (25 temp sorts / 6M sorts) [!!] Joins performed without indexes: 2100047 [OK] Temporary tables created on disk: 8% (1M on disk / 16M total) [!!] Table cache hit rate: 0% (400 open / 11M opened) [OK] Open file limit used: 0% (1/1K) [OK] Table locks acquired immediately: 99% (76M immediate / 76M locks) -------- InnoDB Metrics ---------------------------------------------------------------------------- [--] InnoDB is enabled. [!!] InnoDB buffer pool / data size: 22.5G/75.2G [!!] InnoDB buffer pool instances: 8 [OK] InnoDB Used buffer: 100.00% (1471998 used/ 1472000 total) [OK] InnoDB Read buffer efficiency: 99.98% (37635121893 hits/ 37640805184 total) [!!] InnoDB Write Log efficiency: 65.24% (2290557 hits/ 3510752 total) [OK] InnoDB log waits: 0.00% (0 waits / 1220195 writes)

nothing particularly apartment related, just looks like we should add more RAM, optimise the table_open_cache, and fix some foreign key indexes!