waku-org / nwaku

Waku node and protocol.
Other
201 stars 52 forks source link

bug: node stuck in restart loop when rln-relay-eth-client-address unavailable #3126

Open jakubgs opened 1 month ago

jakubgs commented 1 month ago

Problem

When the RPC endpoint specified in rln-relay-eth-client-address is unavailable for any reason, the node is stuck in a restart loop:

Impact

This behavior makes the node fragile, since issues with just a single endpoint needed for a single protocol can cause the node to fail to start despite only one protocol having a problem. This behavior can easily lead to whole fleets going down simply due to issues with one protocol.

To reproduce

  1. Run node with unavailable rln-relay-eth-client-address
  2. See the restart loop.

Expected behavior

I would expect the node to start and provide functionality of all other protocols aside from rln-relay and simply report that protocol is broken.

Screenshots/logs

DBG 2024-10-17 08:54:16.527+00:00 Sending message to RPC server              topics="JSONRPC-HTTP-CLIENT" tid=1 file=httpclient.nim:79 address="ok((id: \"linux-01.ih-eu-mda1.nimbus.sepolia.wg:8556\", scheme: NonSecure, hostname: \"linux-01.ih-eu-mda1.nimbus.sepolia.wg\", port: 8556, path: \"\", query: \"\", anchor: \"\", username: \"\", password: \"\", addresses: @[10.14.0.131:8556]))" msg_len=59 name=eth_chainId
DBG 2024-10-17 08:54:28.536+00:00 Failed to send POST Request with JSON-RPC  topics="JSONRPC-HTTP-CLIENT" tid=1 file=httpclient.nim:95 e="Connection timed out"

nwaku version/commit hash

v0.33.1

Additional context

Discovered due to firewall issues on node-01.gc-us-central1-a.waku.sandbox.