porsager / postgres

Postgres.js - The Fastest full featured PostgreSQL client for Node.js, Deno, Bun and CloudFlare
The Unlicense
7.04k stars 257 forks source link

arbitrary stuck queries #827

Open JuHwon opened 3 months ago

JuHwon commented 3 months ago

We have an issue, where the sql client just does not respond. In other words, the promise does not resolve nor reject. The application has peaks with lots of concurrent requests. The issue though also happens when there is not much load. It just arbitrary does not respond. When i restart the application, everything works fine again.

The client configuration:

    return postgres(options.connectionString, {
      idle_timeout: 0,
      connect_timeout: 30, 
      max: 50,
    });

I have a feeling that it happens after the service ran for some time already.

I also had a feeling that it often occurs with sql.begin(...), but i have also seen it once at a single sql`SELECT * FROM x WHERE id = ${id}`

Not sure where to start looking at here, since we are currently not able to reproduce it. Does anyone have any ideas why this could happen? I would also be happy to contribute to the project if we are able to find the issue here.

UPDATE Here is the chart of one occurrence of the postgres DB, the needle is placed when the incident happened. image

Miraj98 commented 3 months ago

Facing same issue

JuHwon commented 3 months ago

Facing same issue

Hi @Miraj98, thanks for responding. Do you have any hints where and when you are facing this problem? Is it correlated to transactions (sql.begin)? Does it correlate to the time the process is running? What are your connection settings? Are you able to reproduce it? What are your postgres client options?

maybe also @porsager has some hints for us, where we should look at to find something?

Miraj98 commented 3 months ago

Facing same issue

Hi @Miraj98, thanks for responding. Do you have any hints where and when you are facing this problem? Is it correlated to transactions (sql.begin)? Does it correlate to the time the process is running? What are your connection settings? Are you able to reproduce it? What are your postgres client options?

maybe also @porsager has some hints for us, where we should look at to find something?

Hey @JuHwon I just did some debugging and I am facing connection issues specifically when using bun so not sure if my case will help. Sorry.

JuHwon commented 3 months ago

Facing same issue

Hi @Miraj98, thanks for responding. Do you have any hints where and when you are facing this problem? Is it correlated to transactions (sql.begin)? Does it correlate to the time the process is running? What are your connection settings? Are you able to reproduce it? What are your postgres client options? maybe also @porsager has some hints for us, where we should look at to find something?

Hey @JuHwon I just did some debugging and I am facing connection issues specifically when using bun so not sure if my case will help. Sorry.

i would not exclude the possibility that the issue is related when the only difference is using bun vs node, right?

AdamAkiva commented 3 months ago

TLDR:
Every transaction has to run on the same connection, therefore for every transaction a new connection has to remain open until the transaction ends.
PostgresQL default connection limit is 100, hence you can only use 100 transactions concurrently (unless you changed it on the database level).
I'm not sure why the queries get stuck if that limit is reached, and I advise you to use the options stated below on how to debug and research the issue.

In depth:
A few days ago a co-worker any myself were presented with a similar issue when using Drizzle-ORM and using postgres.js as the handler.
We found a few interesting things.
I'll start by stating PostgresQL default number of concurrent clients is 100. Every transaction by definition has to be done on the same connection, which means if you use them concurrently, e.g using Promise.all and you have a lot of concurrent connections you may reach said limit.
In regards to the requests getting "stuck". This is the interesting/weird part, in essence (from our understanding) we get to a so called "deadlock". What may happen is that if the requests are done concurrently, you may have a lot transaction start statements before even starting to processing any of them.
The expected behavior in this case is just getting an error message for too many clients, but for some reason that does not happen.
There as another issue with Drizzle but that does not effect this specific case.
What I advise you to do is to run the PostgresQL using the all option here, e.g: PostgresQL -c log_statement=all Then follow the logs given by the database and check if there are any errors regarding to the transactions. Consider adding this option as well.
If you find any notes and results, I'll be happy if you share them so we can learn from it as well.

P.S I have an assumption the the connection pool has an issue when too many concurrent transactions are done currently and that's why you had this issue while doing SELECT * FROM x WHERE id = ${id} as well but I can't be sure without specific logs from the database itself.
In addition, if possible, try to increase the max connection option to any value > 100 and see whether you receive an error message which says that too many clients are already active. This may indicate the problem is connected to too many concurrent active transactions

porsager commented 3 months ago

Would you think your descriptions matches the ones in https://github.com/porsager/postgres/pull/738 ? That change has been released in v3.4.4 today. Would be nice if you all could give that a try and report back.

kam-st commented 1 month ago

Issue still persists, directly pulling from github.

JuHwon commented 1 month ago

Issue still persists, directly pulling from github.

I had no time to investigate this issue further yet. I would really love to have a test to reproduce this. @kam-st at what nodejs version are you experiencing this issue?

We got better results, when reducing the concurrency peeks for the client. But still got one occurrence with NodeJS 20 and v3.4.3 two weeks ago. It also seems to correlate with the time the process is running. We updated to NodeJS 21 and v3.4.4 now and will observer further. I hope we find time in the future to investigate this issue in detail and get some reproducible tests.

kam-st commented 1 month ago

Using Node 20.14 LTS.

Does your app make queries on refresh of page? Such as auth ? If yes, then try to refresh page multiple times. I have been able to reproduce this issue for me every single time.

JuHwon commented 1 month ago

Does your app make queries on refresh of page? Such as auth ? If yes, then try to refresh page multiple times. I have been able to reproduce this issue for me every single time.

If you can reproduce it every single time, i am not sure the issue is correlated. In our case it is really arbitrary. It's a rest api service, having some QPS peeks after specific endpoints are invoked. flatting the QPS curve helped to make the issue less likely.

kam-st commented 1 month ago

It is more arbitrary if I have max connections more than 100 set. I feel like there is new connection made for every single query. Base on idle connections, if requests are consistent then no problem. But there are large amount of queries in short period of time, then max connections gets exhausted. Once request load dies down, then idle connections are closed and requests can be made again.

I know idle connections are set to 0 for you.. But is it really 0 if you have high number of concurrent queries. Even if it takes 0.1 sec to close connection, the issue can persists.

kam-st commented 1 month ago

TLDR: Every transaction has to run on the same connection, therefore for every transaction a new connection has to remain open until the transaction ends. PostgresQL default connection limit is 100, hence you can only use 100 transactions concurrently (unless you changed it on the database level). I'm not sure why the queries get stuck if that limit is reached, and I advise you to use the options stated below on how to debug and research the issue.

In depth: A few days ago a co-worker any myself were presented with a similar issue when using Drizzle-ORM and using postgres.js as the handler. We found a few interesting things. I'll start by stating PostgresQL default number of concurrent clients is 100. Every transaction by definition has to be done on the same connection, which means if you use them concurrently, e.g using Promise.all and you have a lot of concurrent connections you may reach said limit. In regards to the requests getting "stuck". This is the interesting/weird part, in essence (from our understanding) we get to a so called "deadlock". What may happen is that if the requests are done concurrently, you may have a lot transaction start statements before even starting to processing any of them. The expected behavior in this case is just getting an error message for too many clients, but for some reason that does not happen. There as another issue with Drizzle but that does not effect this specific case. What I advise you to do is to run the PostgresQL using the all option here, e.g: PostgresQL -c log_statement=all Then follow the logs given by the database and check if there are any errors regarding to the transactions. Consider adding this option as well. If you find any notes and results, I'll be happy if you share them so we can learn from it as well.

P.S I have an assumption the the connection pool has an issue when too many concurrent transactions are done currently and that's why you had this issue while doing SELECT * FROM x WHERE id = ${id} as well but I can't be sure without specific logs from the database itself. In addition, if possible, try to increase the max connection option to any value > 100 and see whether you receive an error message which says that too many clients are already active. This may indicate the problem is connected to too many concurrent active transactions

Hi Adam,

Great amount of information through your post. Just curious, what approach would you recommend to make sure not a lot of connections are opened. Beside using idletime_out etc.

chanon commented 3 weeks ago

Hello everyone. I have consistently reproduced a bug that causes queries to hang/become stuck. Please see #893. I am curious whether this is what you have been experiencing (check your code!), or if there is some other bug.

TLDR of #893 is, if you do sql.begin and then run queries inside it without correctly using the given sql argument, then it becomes likely that your connection will hang.