hasura / graphql-engine

Blazing fast, instant realtime GraphQL APIs on your DB with fine grained access control, also trigger webhooks on database events.
https://hasura.io
Apache License 2.0
31.17k stars 2.76k forks source link

Increase in Hasura idle connections slows database to be unresponsive #7650

Open bgold0 opened 3 years ago

bgold0 commented 3 years ago

Version Information

Server Version: 2.0.7

Environment

Docker

What is the expected behaviour?

Don’t create 15+ connections if we don’t need it.

What is the current behaviour?

Hasura is creating too many connections and is crashing the database. This didn’t always happen. This is something new.

How to reproduce the issue?

I’m not sure

PNG image
bgold0 commented 3 years ago

Update: When you do a BIG query, the query works. But it’s after the fact that makes it fail and blow up.

tirumaraiselvan commented 3 years ago

The number of connections you see may be because of the connection pool idle timeout. Hasura has a default idle timeout of 180s which means that after a connection is created, it will hold the connection for at max 180s so that it may be reused by another query. You can reduce it if you want by editing the source configuration.

The database unresponsiveness maybe due to some heavy query. What is the CPU/mem of your Postgres db?

nicajoo commented 2 years ago

@bgold0 I am currently facing the exact same issue. Database sessions (opened from Hasura IP) increase until a complete freeze. Have you made any progress identifying the root cause of this issue ? Any "patch" like what suggests @tirumaraiselvan ?

tirumaraiselvan commented 2 years ago

@nicajoo Database sessions usually increase in the following situations:

  1. Load increases on Hasura because of requests (this will spawn new connections limited by the connection pool setting per instance, on Cloud there can be auto-scaling as well). This can also be because of a burst in event trigger deliveries.
  2. Some query is taking up lots of locks and blocking other queries (this is usually the case when some long running query or DDL is performed)
  3. It's normal to see many IDLE sessions after a burst of traffic because of connection idle timeout (for reuse of sessions).

The best way to diagnose the root cause is to see the activity on the database during this burst. Do you have any info on this?

nicajoo commented 2 years ago

@tirumaraiselvan thanks, it was indeed right after a large number of events were triggered. However, is there any way we can prevent this from happening ? I set the idle timeout to 60 seconds, though I'm not sure this will fix the issue ? Our app was unusable after this, because database was unreachable.

SameerChorge94 commented 2 years ago

@tirumaraiselvan

I'm facing the same issues, we're using the hasura open-source docker image of Graphql-engine ([v2.0.9]). In our case, as per project requirements we created view on tables and trying to fetch the 100 records for request.

we did load testing with limit 100 and offset 1 for view query, it works fine for this scenario but when we dynamically gives different offset for each request like for one request limit 100 and offset 14500 it takes some time to return the data that above limit and offset

so during testing multiple idle connections were created for the postgres database and at one point hasura also didn't get any active connection and freezed as DB connection was not available.

Is there any solution or workaround we can add to prevent this issue ?

tirumaraiselvan commented 2 years ago

@SameerChorge94

like for one request limit 100 and offset 14500 it takes some time to return the data that above limit and offset

This query is complex by design. Could you check if appropriate indexes are present in the table to support this query?

Is there any solution or workaround we can add to prevent this issue ?

You can reduce the connection pool size so that only a limited number of connections are made by Hasura on the db.

SameerChorge94 commented 2 years ago

Hello @tirumaraiselvan ,

I added the appropriate indexes by referring the postgres documentation https://www.postgresql.org/docs/current/indexes-types.html

added indexes on the fields that are used in view queries for filtering, but getting the same result like for one request limit 100 and offset 14500 it takes some time to return the data

so all requests gets timed out after some point, as database also gets most of the idle connections due to requests takes so much time to return the response

Is there any solution or workaround you can suggest to improve the query performance or handle this type of scenario ?

georgiosd commented 2 years ago

Any progress on this issue? We're having the same problem with Hasura cloud and there's no logical explanation for Hasura exhausting the number of sessions.

BenoitRanque commented 2 years ago

Any progress on this issue? We're having the same problem with Hasura cloud and there's no logical explanation for Hasura exhausting the number of sessions.

Can you offer more details? Why do you not expect hasura to exhaust the number of sessions, when the load asks for it?

saolof commented 2 years ago

We're having the same issue after a migration to postgres 14. All the Hasura queries seem to be permanently idle, while the active queue is dominated by queries that should be lower priority but which are more expensive to run.

Update: all the queries seem to be very small, and seem to be idle on Hasura waiting in the ClientRead state, which according to the postgres docs means it is waiting to read data from Hasura despite this being on a simple select query?

zainul commented 11 months ago

up

martingonzalezabb commented 1 month ago

any update of this issue?