medusajs / medusa

Building blocks for digital commerce
https://medusajs.com
MIT License
24.64k stars 2.43k forks source link

perf: Endpoint `/cart/:id/line-items ` #5693

Open vholik opened 9 months ago

vholik commented 9 months ago

Bug report

Backend Crashes on Posting Too Many Requests to store/cart/:id/line-items

Describe the bug

Our application is experiencing intermittent crashes and unresponsiveness when a high volume of requests is made to the store/cart/:id/line-items endpoint. This issue requires investigation and resolution to ensure the stability and reliability of our backend. We also were testing our custom routes and everything was fine except from this store/cart/:id/line-items

System information

We are running our server in Docker on Digital Ocean (Server - 2X 4 GB RAM | 1 Dedicated vCPU 2, Postgres: 8 GB RAM / 2vCPU / 30 GB Disk / Primary only / FRA1 - PostgreSQL 12)

Medusa version (including plugins): 1.18.0 (We also tried on 1.8.1 and 1.12.0 - problem still the same) Node.js version: 17.1.0 Database: Postgres: 8 GB RAM / 2vCPU / 30 GB Disk / Primary only / FRA1 - PostgreSQL 12 Operating system: Running in Docker Browser (if relevant):

Steps to reproduce the behavior

  1. Access the store/cart/:id/line-items endpoint.
  2. Send 10 requests in one second.
  3. Repeat the process for a 20 seconds
  4. The server should intentionally stop responding to requests

Expected behavior

Server should be giving responses

Screenshots

image image

Code snippets

Additional context

olivermrbl commented 9 months ago

You might be experiencing an exhausted database connection pool.

Each request to add a line item to cart starts a transaction and occupies a database connection. Naturally, as more requests come in, more transactions are started, and connections are occupied. At some point, you will reach the maximum connection limit, and requests will be idle, waiting for connections to be released and ready to occupy anew.

The idle requests for new connections will wait indefinitely unless configured otherwise. So, the more requests you fire, the more idle requests for connections you get. This can eventually impact your server's performance and, in the worst case, crash it because the requests start to time out. On DigitalOcean, the request timeout is 30 seconds (a common default).

In summary, your requests for new database connections are idle longer than the request timeout limit on DigitalOcean. This degrades performance and can crash your server.

I have two immediate suggestions:

Configure the idle transaction timeout You can configure how long a request for a connection stays idle before it times out. The Postgres default is 0, which means it's disabled. I recommend setting this to 15 or 20 seconds so the idle connection requests time out before your requests to DigitalOcean do.

Read more here.

Configure the pool size The default connection pool size in Typeorm (our ORM) is 10. You can increase this to have more available connections for your requests. I don't have a good recommendation, but you could try 15-20. We've seen this work out well for other users.

You can configure both settings in medusa-config.js:

projectConfig: {
...
  database_extra: {
    idle_in_transaction_session_timeout: 20000,
    max: 20,
  }
}

Separately from this, do you have any custom subscribers or services that could be performing actions simultaneously with the items being added to the cart?

olivermrbl commented 9 months ago

Alternatively, you could try to set up PgBouncer, if your Postgres instance allows for it. We don't have much experience with this, so I won't guarantee that it resolve your issues.

vholik commented 9 months ago

We tried to follow your suggestion and add config as below on the newest medusa (1.18.0). We still have the same problem.

 database_extra: {
    idle_in_transaction_session_timeout: 20000,
    max: 20,
  }

image

vholik commented 9 months ago

We also tried to add PgBouncer but problem remains the same

olivermrbl commented 9 months ago

We are working on improving performance of this specific endpoint in #5701. Initial tests show significant improvements. You can expect this to land on develop some time next week :)