Running a Graph Node instance configured for multiple chains where each chain has a primary DB shard and a read replica has unexpectedly poor latency result due to multiple connection disruptions between the Graph Node instance and the read replica installed on a separate physical server. PostgreSQL version is 16. The Graph Node release is v0.33.0.
Read replica DB logs are strewn with message like could not receive data from client: Connection reset by peer, FATAL: connection to client lost, and could not send data to client: Broken pipe which after extended investigation points out to a possible Graph Node inefficiency in handling multiple connections set via pool_size option.
Therefore, questions to ask:
Are there any benchmark tests or recommendations that could serve as the configuration guidelines for optimal pool sizes (number of opened connections) that Graph Node can endure without causing any disruptions?
Are there likely to be connection issues with the above config?
Can queries themselves cause connection disruption on the Graph Node side when the DB is taking too much time to return the result?
Documentation is very scarce on this and doesn't help much. Could you please add best practices for configuring Graph Node to the docs so that performance optimization could be cogent.
Relevant log output
2024-02-12 17:40:09.857 UTC [1427134] graphpguser@mantle LOG: could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.858 UTC [1427252] graphpguser@mantle LOG: could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.859 UTC [1427289] graphpguser@mantle LOG: could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.859 UTC [1427224] graphpguser@mantle LOG: could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.859 UTC [1427235] graphpguser@graphnode LOG: could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.859 UTC [1427190] graphpguser@mantle LOG: could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.859 UTC [1427212] graphpguser@mantle LOG: could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.860 UTC [1427291] graphpguser@mantle LOG: could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.860 UTC [1427320] graphpguser@mantle LOG: could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.860 UTC [1427301] graphpguser@mantle LOG: could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.860 UTC [1425906] graphpguser@mantle LOG: could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.860 UTC [1427712] graphpguser@mantle LOG: could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.860 UTC [1427712] graphpguser@mantle LOG: unexpected EOF on client connection with an open transaction
2024-02-12 17:40:09.860 UTC [1425904] graphpguser@mantle LOG: could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.861 UTC [1427287] graphpguser@mantle LOG: could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.861 UTC [1425366] graphpguser@mantle LOG: could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.861 UTC [1427228] graphpguser@graphnode LOG: could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.863 UTC [1427181] graphpguser@mantle LOG: could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.863 UTC [1423289] graphpguser@mantle LOG: could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.863 UTC [1427172] graphpguser@graphnode LOG: could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.863 UTC [1427265] graphpguser@mantle LOG: could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.864 UTC [1427227] graphpguser@mantle LOG: could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.864 UTC [1425874] graphpguser@mantle LOG: could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.864 UTC [1427218] graphpguser@mantle LOG: could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.864 UTC [1427205] graphpguser@mantle LOG: could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.864 UTC [1426590] graphpguser@mantle LOG: could not receive data from client: Connection reset by peer
...
2024-02-12 17:40:09.962 UTC [1427251] graphpguser@mantle LOG: could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.962 UTC [1427237] graphpguser@mantle LOG: could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.964 UTC [1425368] graphpguser@mantle_testnet LOG: could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.964 UTC [1427148] graphpguser@mantle_sepolia_testnet LOG: could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.967 UTC [1427149] graphpguser@mantle_sepolia_testnet LOG: could not receive data from client: Connection reset by peer
2024-02-12 17:40:09.994 UTC [1427709] graphpguser@mantle LOG: could not send data to client: Broken pipe
2024-02-12 17:40:09.994 UTC [1427709] graphpguser@mantle STATEMENT: /* controller='filter',application='sgd96',route='223118cf9f70ee4b-
**** cut off ****
2024-02-12 17:40:09.994 UTC [1427709] graphpguser@mantle FATAL: connection to client lost
2024-02-12 17:40:09.994 UTC [1427709] graphpguser@mantle STATEMENT: /* controller='filter',application='sgd96',route='223118cf9f70ee4b-
**** cut off ****
2024-02-12 17:40:10.059 UTC [1426772] graphpguser@mantle LOG: could not send data to client: Connection reset by peer
2024-02-12 17:40:10.059 UTC [1426772] graphpguser@mantle STATEMENT: /* controller='filter',application='sgd96',route='223118cf9f70ee4b-
**** cut off ****
2024-02-12 17:40:10.059 UTC [1426772] graphpguser@mantle FATAL: connection to client lost
2024-02-12 17:40:10.059 UTC [1426772] graphpguser@mantle STATEMENT: /* controller='filter',application='sgd96',route='223118cf9f70ee4b-
**** cut off ****
2024-02-12 17:40:13.849 UTC [1427319] graphpguser@mantle LOG: could not receive data from client: Connection reset by peer
2024-02-12 17:40:13.849 UTC [1427319] graphpguser@mantle LOG: unexpected EOF on client connection with an open transaction
2024-02-12 17:40:13.849 UTC [1425084] graphpguser@mantle LOG: could not receive data from client: Connection reset by peer
2024-02-12 17:40:13.849 UTC [1425084] graphpguser@mantle LOG: unexpected EOF on client connection with an open transaction
2024-02-12 17:40:13.849 UTC [1427321] graphpguser@mantle LOG: could not receive data from client: Connection reset by peer
2024-02-12 17:40:13.849 UTC [1427321] graphpguser@mantle LOG: unexpected EOF on client connection with an open transaction
IPFS hash
No response
Subgraph name or link to explorer
No response
Some information to help us out
[ ] Tick this box if this bug is caused by a regression found in the latest release.
[X] Tick this box if this bug is specific to the hosted service.
[X] I have searched the issue tracker to make sure this issue is not a duplicate.
Bug report
Running a Graph Node instance configured for multiple chains where each chain has a primary DB shard and a read replica has unexpectedly poor latency result due to multiple connection disruptions between the Graph Node instance and the read replica installed on a separate physical server. PostgreSQL version is 16. The Graph Node release is v0.33.0.
Store is configured as follows:
Read replica DB logs are strewn with message like could not receive data from client: Connection reset by peer, FATAL: connection to client lost, and could not send data to client: Broken pipe which after extended investigation points out to a possible Graph Node inefficiency in handling multiple connections set via pool_size option.
Therefore, questions to ask:
Are there any benchmark tests or recommendations that could serve as the configuration guidelines for optimal pool sizes (number of opened connections) that Graph Node can endure without causing any disruptions?
Are there likely to be connection issues with the above config?
Can queries themselves cause connection disruption on the Graph Node side when the DB is taking too much time to return the result?
Documentation is very scarce on this and doesn't help much. Could you please add best practices for configuring Graph Node to the docs so that performance optimization could be cogent.
Relevant log output
IPFS hash
No response
Subgraph name or link to explorer
No response
Some information to help us out
OS information
Linux