Allows kathy, relayer, and processor to isolate failures at the channel level and retry channel task instead of crashing whole agent if one channel fails
Note that isolating channel failures is not relevant to the updater (updater only touches home)
This behavior did not seem desirable for the watcher
Code Changes
Agent::run no longer borrows &self and instead takes an agent-specific <Agent>Channel struct that defines all data types needed to run one home <> replica channel
Agent::run_many builds an <Agent>Channel struct and hands this off to an Agent::run task; if the run task errors out, it will log error and try to start it again instead of returning error to top level
Watcher and updater ignore this pattern, as they must overwrite Agent::run_all
TODO:
[ ] add unit tests to mock faulty RPC
[x] add exponential backoff for retries
[x] metric to track channel number of channel faults
High Level Changes:
Code Changes
Agent::run
no longer borrows&self
and instead takes an agent-specific<Agent>Channel
struct that defines all data types needed to run one home <> replica channelAgent::run_many
builds an<Agent>Channel
struct and hands this off to anAgent::run
task; if the run task errors out, it will log error and try to start it again instead of returning error to top levelAgent::run_all
TODO: [ ] add unit tests to mock faulty RPC [x] add exponential backoff for retries [x] metric to track channel number of channel faults
Closes #161