Hub agents are pushing messages from home to all the other chain's replicas. This makes the hub agents dependent on the faultiest channel (e.g. worst RPC) and they will fail in entirety if one channel fails (e.g. moonbase RPC failure will cause rinkeby --> kovan to also stop). We want to isolate each channel's tasks so that other channels can continue running if one fails.
Stop canceling agent tasks if one channel task fails. If one fails, emit an error message and retry (maybe with exponential retry).
Make sure we can see in Grafana if a channel has stopped and that we have alerts for each channel to notify us.
Hub agents are pushing messages from home to all the other chain's replicas. This makes the hub agents dependent on the faultiest channel (e.g. worst RPC) and they will fail in entirety if one channel fails (e.g. moonbase RPC failure will cause rinkeby --> kovan to also stop). We want to isolate each channel's tasks so that other channels can continue running if one fails.
Stop canceling agent tasks if one channel task fails. If one fails, emit an error message and retry (maybe with exponential retry).
Make sure we can see in Grafana if a channel has stopped and that we have alerts for each channel to notify us.