nomad-xyz / nomad-monorepo

Contracts, off-chain agents, and libraries for Nomad
https://nomad.xyz
78 stars 16 forks source link

refactor: isolate agent channel failures #161

Open luketchang opened 2 years ago

luketchang commented 2 years ago

Hub agents are pushing messages from home to all the other chain's replicas. This makes the hub agents dependent on the faultiest channel (e.g. worst RPC) and they will fail in entirety if one channel fails (e.g. moonbase RPC failure will cause rinkeby --> kovan to also stop). We want to isolate each channel's tasks so that other channels can continue running if one fails.

Stop canceling agent tasks if one channel task fails. If one fails, emit an error message and retry (maybe with exponential retry).

Make sure we can see in Grafana if a channel has stopped and that we have alerts for each channel to notify us.