In certain edge-cases, especially around sudden bursts of activity, the relayer might get stuck due to bugs in its logic. In these situations, all XMsgs between a particular (Sending Chain, Destination Chain) tuple will freeze.
It will be useful to be able to manually intercede in such cases, taking human action to get us across whatever the sticking point is. This will require tooling that (for any stream) can read the latest delivered offset, fetch XMsgs as needed, and send them. This tooling should be:
as independent of the relayer as possible (so as to not get blocked by the same bug)
no code reused
different sender address
as configurable as possible (so as to be able to handle a variety of situations)
gas params should be configurable
XMsg batch size should be configurable
(Regardless of this stopgap tooling, we should fix whatever bug in the relayer caused the issue in the first place -- this is meant to minimize the operational impact, not replace having a correct relayer).
Fuzzy stream is stuck due to a xmsg offset gap while streaming attestations via cprovider.
This could be due to a gap in xmsg offsets resulting from finalized overrides, or just two consecutive fuzzy attestations not pointing to consecutive xmsgs.
This gap needs to be filled from the finalized attestations.
Input to the command should be the target stream and xmsg offset
The tool should then search for that xmsg in the finalized attestations,
And create and submit the correct submission to unblock the relayer
In certain edge-cases, especially around sudden bursts of activity, the relayer might get stuck due to bugs in its logic. In these situations, all XMsgs between a particular (Sending Chain, Destination Chain) tuple will freeze.
It will be useful to be able to manually intercede in such cases, taking human action to get us across whatever the sticking point is. This will require tooling that (for any stream) can read the latest delivered offset, fetch XMsgs as needed, and send them. This tooling should be:
(Regardless of this stopgap tooling, we should fix whatever bug in the relayer caused the issue in the first place -- this is meant to minimize the operational impact, not replace having a correct relayer).