graphprotocol / graph-node

Graph Node indexes data from blockchains such as Ethereum and serves it over GraphQL
https://thegraph.com
Apache License 2.0
2.91k stars 968 forks source link

[Bug] Multiple messages about closed connections in PG read replica logs #5205

Open AnPiakhota opened 8 months ago

AnPiakhota commented 8 months ago

Bug report

Running a Graph Node instance configured for multiple chains where each chain has a primary DB shard and a read replica has unexpectedly poor latency result due to multiple connection disruptions between the Graph Node instance and the read replica installed on a separate physical server. PostgreSQL version is 16. The Graph Node release is v0.33.0.

Store is configured as follows:

[store]
[store.primary]
connection = "postgresql://user:******@localhost:5432/graphnode"
pool_size = 40
weight = 1
[store.primary.replicas.repl1]
connection = "postgresql://user:******@repl1host:5432/graphnode"
pool_size = 40
weight = 1

[store.mantle]
connection = "postgresql://user:******@localhost:5432/mantle"
pool_size = 200
weight = 1
[store.mantle.replicas.repl1]
connection = "postgresql://user:******@repl1host:5432/mantle"
pool_size = 200
weight = 1

[store.mantle_testnet]
connection = "postgresql://user:******@localhost:5432/mantle_testnet"
pool_size = 20
weight = 1
[store.mantle_testnet.replicas.repl1]
connection = "postgresql://user:******@repl1host:5432/mantle_testnet"
pool_size = 20
weight = 1

[store.mantle_sepolia_testnet]
connection = "postgresql://user:******@localhost:5432/mantle_sepolia_testnet"
pool_size = 20
weight = 1
[store.mantle_sepolia_testnet.replicas.repl1]
connection = "postgresql://user:******@repl1host:5432/mantle_sepolia_testnet"
pool_size = 20
weight = 1

...

Read replica DB logs are strewn with message like could not receive data from client: Connection reset by peer, FATAL: connection to client lost, and could not send data to client: Broken pipe which after extended investigation points out to a possible Graph Node inefficiency in handling multiple connections set via pool_size option.

Therefore, questions to ask:

Are there any benchmark tests or recommendations that could serve as the configuration guidelines for optimal pool sizes (number of opened connections) that Graph Node can endure without causing any disruptions?

Are there likely to be connection issues with the above config?

Can queries themselves cause connection disruption on the Graph Node side when the DB is taking too much time to return the result?

Documentation is very scarce on this and doesn't help much. Could you please add best practices for configuring Graph Node to the docs so that performance optimization could be cogent.

Relevant log output

2024-02-12 17:40:09.857 UTC [1427134] graphpguser@mantle LOG:  could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.858 UTC [1427252] graphpguser@mantle LOG:  could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.859 UTC [1427289] graphpguser@mantle LOG:  could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.859 UTC [1427224] graphpguser@mantle LOG:  could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.859 UTC [1427235] graphpguser@graphnode LOG:  could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.859 UTC [1427190] graphpguser@mantle LOG:  could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.859 UTC [1427212] graphpguser@mantle LOG:  could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.860 UTC [1427291] graphpguser@mantle LOG:  could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.860 UTC [1427320] graphpguser@mantle LOG:  could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.860 UTC [1427301] graphpguser@mantle LOG:  could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.860 UTC [1425906] graphpguser@mantle LOG:  could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.860 UTC [1427712] graphpguser@mantle LOG:  could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.860 UTC [1427712] graphpguser@mantle LOG:  unexpected EOF on client connection with an open transaction
2024-02-12 17:40:09.860 UTC [1425904] graphpguser@mantle LOG:  could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.861 UTC [1427287] graphpguser@mantle LOG:  could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.861 UTC [1425366] graphpguser@mantle LOG:  could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.861 UTC [1427228] graphpguser@graphnode LOG:  could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.863 UTC [1427181] graphpguser@mantle LOG:  could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.863 UTC [1423289] graphpguser@mantle LOG:  could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.863 UTC [1427172] graphpguser@graphnode LOG:  could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.863 UTC [1427265] graphpguser@mantle LOG:  could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.864 UTC [1427227] graphpguser@mantle LOG:  could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.864 UTC [1425874] graphpguser@mantle LOG:  could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.864 UTC [1427218] graphpguser@mantle LOG:  could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.864 UTC [1427205] graphpguser@mantle LOG:  could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.864 UTC [1426590] graphpguser@mantle LOG:  could not receive data from client: Connection reset by peer
...

2024-02-12 17:40:09.962 UTC [1427251] graphpguser@mantle LOG:  could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.962 UTC [1427237] graphpguser@mantle LOG:  could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.964 UTC [1425368] graphpguser@mantle_testnet LOG:  could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.964 UTC [1427148] graphpguser@mantle_sepolia_testnet LOG:  could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.967 UTC [1427149] graphpguser@mantle_sepolia_testnet LOG:  could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.994 UTC [1427709] graphpguser@mantle LOG:  could not send data to client: Broken pipe
2024-02-12 17:40:09.994 UTC [1427709] graphpguser@mantle STATEMENT:  /* controller='filter',application='sgd96',route='223118cf9f70ee4b-
**** cut off ****
2024-02-12 17:40:09.994 UTC [1427709] graphpguser@mantle FATAL:  connection to client lost
2024-02-12 17:40:09.994 UTC [1427709] graphpguser@mantle STATEMENT:  /* controller='filter',application='sgd96',route='223118cf9f70ee4b-
**** cut off ****
2024-02-12 17:40:10.059 UTC [1426772] graphpguser@mantle LOG:  could not send data to client: Connection reset by peer
2024-02-12 17:40:10.059 UTC [1426772] graphpguser@mantle STATEMENT:  /* controller='filter',application='sgd96',route='223118cf9f70ee4b-
**** cut off ****
2024-02-12 17:40:10.059 UTC [1426772] graphpguser@mantle FATAL:  connection to client lost
2024-02-12 17:40:10.059 UTC [1426772] graphpguser@mantle STATEMENT:  /* controller='filter',application='sgd96',route='223118cf9f70ee4b-
**** cut off ****
2024-02-12 17:40:13.849 UTC [1427319] graphpguser@mantle LOG:  could not receive data from client: Connection reset by peer
2024-02-12 17:40:13.849 UTC [1427319] graphpguser@mantle LOG:  unexpected EOF on client connection with an open transaction
2024-02-12 17:40:13.849 UTC [1425084] graphpguser@mantle LOG:  could not receive data from client: Connection reset by peer
2024-02-12 17:40:13.849 UTC [1425084] graphpguser@mantle LOG:  unexpected EOF on client connection with an open transaction
2024-02-12 17:40:13.849 UTC [1427321] graphpguser@mantle LOG:  could not receive data from client: Connection reset by peer
2024-02-12 17:40:13.849 UTC [1427321] graphpguser@mantle LOG:  unexpected EOF on client connection with an open transaction

IPFS hash

No response

Subgraph name or link to explorer

No response

Some information to help us out

OS information

Linux

github-actions[bot] commented 2 months ago

Looks like this issue has been open for 6 months with no activity. Is it still relevant? If not, please remember to close it.