prisma / prisma

Next-generation ORM for Node.js & TypeScript | PostgreSQL, MySQL, MariaDB, SQL Server, SQLite, MongoDB and CockroachDB
https://www.prisma.io
Apache License 2.0
39.16k stars 1.53k forks source link

Inappropriate Connection Pool Timeout Error #19820

Open akawalsky opened 1 year ago

akawalsky commented 1 year ago

Bug description

Hi, we are running into an issue with Prisma and postgres with the following error -

Timed out fetching a new connection from the connection pool. More info: http://pris.ly/d/connection-pool (Current connection pool timeout: 10, connection limit: 1)

This is occurring in an AWS lambda instance. We have already read and implemented the appropriate changes based on this documentation https://www.prisma.io/docs/guides/performance-and-optimization/connection-management#serverless-environments-faas by ensuring that we set the connection limit to 1 (as seen in the error above) and instantiating the client outside of the lambda invocation. However, this issue persists in about 1/1000 queries to the db.

I have also confirmed that we are no where near hitting our connection limit on the postgres side when this error occurs, so I am led to believe that there is something happening on the prisma side rather than on the db side.

It seems that we are getting this error within a fraction of a second of the request starting, which means that it is not waiting the full 10 seconds before throwing.

I think this is a bug in the query engine. The error is happening here https://github.com/prisma/prisma-engines/blob/main/query-engine/connectors/sql-query-connector/src/database/postgresql.rs#L71 which calls out to https://github.com/prisma/quaint/blob/main/src/pooled.rs#L479-L504

I believe that under some race condition, mobc is incorrectly throwing a timeout error which is then caught, but we're not seeing the actual error message, because it is overridden here - https://github.com/prisma/prisma-engines/blob/main/libs/user-facing-errors/src/query_engine/mod.rs#L239

When I set pool_timeout to 0, we don't see this error anymore.

Right now we have just switched to setting pool_timeout to 0 as a workaround, but it does seem like there is a bug going on on Prisma's end.

How to reproduce

Use prisma in a Lambda with serverless-express.

Prisma will transiently fail.

Expected behavior

We should get a better error description than pool timeout to be able to identify the underlying issue.

Prisma information

Any arbitrary query that runs first on lambda invocation.

Environment & setup

AWS Lambda with Postgres

Prisma Version

4.15.0
kylesloper commented 1 year ago

Seeing a greater amount of connection pool errors being thrown recently here too

arch-user-france1 commented 1 year ago

Prisma does seem to have a problem with connections since this error is occurring on a server with the nodejs server and the mysql server on the same machine. The connection limit is 350, but most of the time up to 10 connections are established and no more.

Here's the error:

Timed out fetching a new connection from the connection pool. More info: http://pris.ly/d/connection-pool (Current connection pool timeout: 10, connection limit: 9)
janpio commented 1 year ago

Your Prisma is configured to only use 9 connections in your app @arch-user-france1:

..., connection limit: 9)

You can change that by using connection_limit=x in the connection string as a query param: https://www.prisma.io/docs/concepts/components/prisma-client/working-with-prismaclient/connection-pool#setting-the-connection-pool-size It is not enough to just increase the limit on your database side.

arch-user-france1 commented 1 year ago

@janpio, prisma won't wait for a connection to be free?

I did not know that. Thank you for your quick response. I'll increase the timeout.

janpio commented 1 year ago

It will wait for the connection pool timeout, which by default (and in your app) is 10 seconds. You can increase that limit to a really high number for Prisma to wait forever by using pool_timeout (https://www.prisma.io/docs/concepts/components/prisma-client/working-with-prismaclient/connection-pool#setting-the-connection-pool-timeout), but of course your throughput will then still be limited by those 9 connections that Prisma can only use at a time, and the queries will wait a long time.

Increasing the pool size / connection limit is definitely the better approach.

arch-user-france1 commented 1 year ago

@janpio, the error may appear immediately, and, in development, still occurrs. I am sure the connection pool was not nearly exhausted.

janpio commented 1 year ago

Then please definitely open a new issue, provide as much information as possible - optimally a reproduction app on GitHub - and also fill all the information the bug report issue template asks for, because we understand that to even be able to think on what could be going wrong. Thanks.

szuvi commented 4 months ago

I'm seeing the same issue. I have Prisma working on Lambda with serverless-express. I get the P2024 error a few times a day, and that's not correlated with Lambda concurrent invocations or my RDS active connections (nowhere near the limit). Also, the error seems to be ignoring the timeout, throwing as soon as the Lambda starts execution, as in the logs:


2024-06-03T13:55:41.618+02:00 | START RequestId: b0a43cbf-2a4e-4e4c-8373-6bd80598cda9 Version: $LATEST

2024-06-03T13:55:41.640+02:00  b0a43cbf-2a4e-4e4c-8373-6bd80598cda9 INFO    prisma:error 
Invalid `prisma.tag.findMany()` invocation:
Timed out fetching a new connection from the connection pool. More info: http://pris.ly/d/connection-pool (Current connection pool timeout: 10, connection limit: 7)

I'm not explicitly setting pool timeout or connection limit.

why-el commented 4 months ago

@janpio I can confirm we are also seeing this. Here is a sample error log from AWS Lambda:

START RequestId: a64f4a34-1d3a-4d81-9e47-48577592156b Version: $LATEST
2024-06-05T11:12:35.532-04:00
2024-06-05T15:12:35.532Z    a64f4a34-1d3a-4d81-9e47-48577592156b    ERROR   Unhandled Promise Rejection     
{
    "errorType": "Runtime.UnhandledPromiseRejection",
    "errorMessage": "PrismaClientKnownRequestError: \nInvalid `prisma.user.findFirst()` invocation:\n\n\nTimed out fetching a new connection from the connection pool. More info: http://pris.ly/d/connection-pool (Current connection pool timeout: 10, connection limit: 5)",

Note that request a64f4a34-1d3a-4d81-9e47-48577592156b starts at 11:12:35.532-04:00, and immediately on the next line, at more or less the exact same timestamp, we see a pool is full error raised and the request halted. This should not happen, because at minimum a full 10 seconds should have been observed, or else either I don't understand how pooling works the way Prisma wants it to work or there is indeed a bug in Prisma as the original author of this issue argues.

janpio commented 4 months ago

@why-el Can you help us with a reproduction? That would be most helpful here.

why-el commented 4 months ago

Yes sir I am trying, but it's fighting me very hard. It happens extremely infrequently, but as you can imagine, it happens in the production application which affects our users.

why-el commented 4 months ago

Couple of hours later I am unable to reproduce it, sadly. It's real, but I think it involves some non-trivial interactions between AWS Lambda, Prisma, and Graphql resolvers, so it is not feasible to hunt it further besides maybe increasing the connection_limit some more.

One thing I noticed locally, no idea if it sheds any light, that if I do a Promise.all with say 50 queries inside a resolver (our connection_limit is a mere 5 but the database has 5000 connections), you would think only 5 connections will be created and the other connections will wait, but somehow Prisma creates 10 instead (i.e. the max_open count). Prisma version 5.12. Upgrading did not fix this part either.

janpio commented 4 months ago

One thing I noticed locally, no idea if it sheds any light, that if I do a Promise.all with say 50 queries inside a resolver (our connection_limit is a mere 5 but the database has 5000 connections), you would think only 5 connections will be created and the other connections will wait, but somehow Prisma creates 10 instead (i.e. the max_open count). Prisma version 5.12. Upgrading did not fix this part either.

That would be surprising and concerning, as connection_limit is a hard limit. Can you please open a bug issue, and optimally provide some information that lets us reproduce? Thanks.

szuvi commented 4 months ago

I also am unable to determine, what triggers the error. From my observations it usually is thrown on a Lambda, that starts execution, gets 1-2 requests, then is idle for 5-10 min, and on the next request the error is thrown. This is the only thing that was consistent for me, besides that, I wasn't able to reproduce this on demand.

why-el commented 4 months ago

@szuvi are these trivial / normal / REST queries or are they graphql resolver queries? Reason I ask is because the issue tracker is littered with mentions of graphql resolvers as particular hot beds for possible bugs, so if you don't use it, that's a good point to know.

@janpio Like I said I only observe that behavior in my own Graphql heavy setup. I re-run the same thing on a brand new branch and I couldn't reproduce it, so I think for now we are good.

szuvi commented 4 months ago

@why-el We are not using graphql. When it comes to the queries we use, we have some more complicated ones in our codebase, although, one of the Lambdas is using only simple queries, with some selects or includes, nothing super sophisticated.

why-el commented 4 months ago

@szuvi that's actually great to hear, simplifies our problem. And what version? Mine is 5.12.

szuvi commented 4 months ago

@why-el We are currently at 5.4.2