Open bkontur opened 4 months ago
yes we do caching, but, also I would like to check RPC/runtime calls (and subscribtions) monitoring to see what and how often we do RPC/runtime calls, if there is any space for optimization.
Also maybe, the separate 6-relayer setup could help by itself
again, this errors stop relaying finality:
[BridgeHubPolkadot_to_BridgeHubKusama_MessageLane_00000001] 2024-07-23 14:47:28 +00 ERROR bridge Error retrieving state from BridgeHubKusama node: FailedToGetSystemHealth { chain: "BridgeHubPolkadot", error: RpcError(RestartNeeded(Transport(connection closed
[Kusama_to_BridgeHubPolkadot_Sync] 2024-07-23 14:47:27 +00 ERROR bridge Finality sync loop iteration has failed with error: Source(ChannelError("Background task of Kusama client has exited with result: Err(ChannelError(\"Mandatory best headers subscription for Kusama has finished\"))"))
[Polkadot_to_BridgeHubKusama_Sync] 2024-07-23 14:47:23 +00 ERROR bridge Finality sync loop iteration has failed with error: Target(FailedToGetSystemHealth { chain: "BridgeHubKusama", error: RpcError(RestartNeeded(Transport(connection closed
Investigate/check
RestartNeeded
does it stop loop or restart? Or the only solution is to restart substrate-relay?Possible improvement 1:
Now we are connected to the one exact node uri, e.g.:
If the node is down, or has some problem, we could configure
list
ofuri
s, so whenRestartNeeded
, we rotate and try another uri, e.g.:So, if one node is overloaded, we just try another one.
Possible improvement 2 - connect substrate-relay to some "load balancer"
This "load balancer" would do routing to the live and not overloaded node, instead of handling this in our code.
Some logs from 2024-07-12/15
https://matrix.to/#/!FqmgUhjOliBGoncGwm:parity.io/$OjKXcX4aO9lkzM46fRLKXTMi-mf9vcpdJN_RDMgIn6o?via=parity.io
e.g.: