tattle-made / DAU

MCA Tipline for Deepfakes
GNU General Public License v3.0
6 stars 0 forks source link

Identify Optimum PostGres settings for launch #59

Closed dennyabrain closed 4 months ago

dennyabrain commented 4 months ago

One of the bottlenecks in the infra is how many concurrent writes can our database support. One of the scenarios we want to be prepared for is getting ten lakh messages over an hour. We have strategies in place to scale (vertically or horizontally) our web server. That means these web servers would open connections to our Postgres instance. The client library for dealing with databases (Ecto) used in the web server has support for managing connection pools. Scope of this feature is to find out

  1. For a given amount of RAM how many simultaneus connections can a Postgres server handle
  2. Best practices of connection pooling to be configured in the Ecto (database ORM used in the web server)
  3. Ideal combination of Postgres, web server count and connection pool size to handle our usage scenario
dennyabrain commented 4 months ago

source

With Postgres, each new connection can take upto 1.3MB in memory

dennyabrain commented 4 months ago

One thing that works in our favour is that we need this high throughput only for one specific write operation. This is the operation to write incoming whatsapp messages into the database. So we don't have to worry about any update collision related issues.

duggalsu commented 4 months ago

Tradeoffs for client-side pooling (e.g. Ecto) vs. middleware (external) pooling (e.g. pgBouncer, pgPool)

One example that I came personally across was when I was using elixir together with kubernetes. During some of our operations, we would spawn many kubernetes pods at the same time, and all of them would try to get Database connections together. In those times, it would be easy to go over the number database connections that our Postgres instance actually supported.


- https://stackoverflow.com/questions/71352508/what-are-the-pros-and-cons-of-client-side-connect-pools-vs-external-connection-p

From my experience and understanding, the disadvantages of an external pool are:

Usually, a connection pool on the application side is a good thing for the reasons you detail. An external connection pool only makes sense if

Other relevant info

duggalsu commented 4 months ago

Max number of connections postgres can support

(See section) How to Find the Optimal Database Connection Pool Size



- https://stackoverflow.com/questions/30778015/how-to-increase-the-max-connections-in-postgres
duggalsu commented 4 months ago

Load testing postgres on EC2

dennyabrain commented 4 months ago

Instructions on how to run pgbench for profiling insert operations - https://github.com/tattle-made/feluda/wiki/Optimization#testing-insert-performance