informalsystems / hermes

IBC Relayer in Rust
https://hermes.informal.systems
Apache License 2.0
442 stars 326 forks source link

Delay when submitting RPC events #1465

Open ancazamfir opened 3 years ago

ancazamfir commented 3 years ago

Crate

relayer

Summary of Bug

I was running hermes listen command and noticed that a NewBlock event extracted by the monitor (tid 18 in my run) is not processed immediately but rather on the next event, for example:

Oct 18 16:07:49.986 TRACE ThreadId(18) Extracted NewBlock Height { revision: 0, height: 546 }, Time(2021-10-18T14:07:39.754107Z)
Oct 18 16:08:00.006 TRACE ThreadId(18) Extracted NewBlock Height { revision: 0, height: 547 }, Time(2021-10-18T14:07:49.882555Z)
Oct 18 16:08:00.006 TRACE ThreadId(18) process_batch EventBatch { chain_id: ChainId { id: "ibc-0", version: 0 }, height: Height { revision: 0, height: 546 }, events: [NewBlock(NewBlock { height: Height { revision: 0, height: 546 } })] }

Notice the timestamps, and the 10sec gaps (ibc-0 is configured with 10 sec block time). The sequence seems to be:

Here are full logs for a client update:

...
     Running `target/debug/hermes listen ibc-0`
Oct 18 16:07:39.838  INFO ThreadId(01) using default configuration from '/Users/ancaz/.hermes/config.toml'
Oct 18 16:07:39.842  INFO ThreadId(01) listening for events `Tx, NewBlock` on 'ibc-0'...
Oct 18 16:07:39.845 TRACE ThreadId(01) [ibc-0] subscribing to query: tm.event = 'Tx'
Oct 18 16:07:39.847 TRACE ThreadId(01) [ibc-0] subscribing to query: tm.event = 'NewBlock'
Oct 18 16:07:39.848 TRACE ThreadId(01) [ibc-0] subscribed to all queries
Oct 18 16:07:39.849 DEBUG ThreadId(18) [ibc-0] starting event monitor
Oct 18 16:07:49.986 TRACE ThreadId(18) Extracted NewBlock Height { revision: 0, height: 546 }, Time(2021-10-18T14:07:39.754107Z)
Oct 18 16:08:00.006 TRACE ThreadId(18) Extracted NewBlock Height { revision: 0, height: 547 }, Time(2021-10-18T14:07:49.882555Z)
Oct 18 16:08:00.006 TRACE ThreadId(18) process_batch EventBatch { chain_id: ChainId { id: "ibc-0", version: 0 }, height: Height { revision: 0, height: 546 }, events: [NewBlock(NewBlock { height: Height { revision: 0, height: 546 } })] }
Oct 18 16:08:00.007  INFO ThreadId(01) - event batch at height 0-546
Oct 18 16:08:00.007  INFO ThreadId(01) + NewBlock(
    NewBlock {
        height: Height {
            revision: 0,
            height: 546,
        },
    },
)
Oct 18 16:08:00.007  INFO ThreadId(01)
Oct 18 16:08:10.085 TRACE ThreadId(18) Extracted NewBlock Height { revision: 0, height: 548 }, Time(2021-10-18T14:07:59.967221Z)
Oct 18 16:08:10.085 TRACE ThreadId(18) process_batch EventBatch { chain_id: ChainId { id: "ibc-0", version: 0 }, height: Height { revision: 0, height: 547 }, events: [NewBlock(NewBlock { height: Height { revision: 0, height: 547 } })] }
Oct 18 16:08:10.085  INFO ThreadId(01) - event batch at height 0-547
Oct 18 16:08:10.085  INFO ThreadId(01) + NewBlock(
    NewBlock {
        height: Height {
            revision: 0,
            height: 547,
        },
    },
)
Oct 18 16:08:10.085  INFO ThreadId(01)
Oct 18 16:08:20.148 TRACE ThreadId(18) Extracted NewBlock Height { revision: 0, height: 549 }, Time(2021-10-18T14:08:10.026964Z)
Oct 18 16:08:20.149 TRACE ThreadId(18) process_batch EventBatch { chain_id: ChainId { id: "ibc-0", version: 0 }, height: Height { revision: 0, height: 548 }, events: [NewBlock(NewBlock { height: Height { revision: 0, height: 548 } })] }
Oct 18 16:08:20.149  INFO ThreadId(01) - event batch at height 0-548
Oct 18 16:08:20.149  INFO ThreadId(01) + NewBlock(
    NewBlock {
        height: Height {
            revision: 0,
            height: 548,
        },
    },
)
Oct 18 16:08:20.149  INFO ThreadId(01)
Oct 18 16:08:30.206 TRACE ThreadId(18) Extracted ibc_event UpdateClient(UpdateClient { common: Attributes { height: Height { revision: 0, height: 550 }, client_id: ClientId("07-tendermint-2"), client_type: Tendermint, consensus_height: Height { revision: 1, height: 2663 } }, header: Some(Tendermint( Header {...})) })
Oct 18 16:08:30.206 TRACE ThreadId(18) process_batch EventBatch { chain_id: ChainId { id: "ibc-0", version: 0 }, height: Height { revision: 0, height: 549 }, events: [NewBlock(NewBlock { height: Height { revision: 0, height: 549 } })] }
Oct 18 16:08:30.206  INFO ThreadId(01) - event batch at height 0-549
Oct 18 16:08:30.207  INFO ThreadId(01) + NewBlock(
    NewBlock {
        height: Height {
            revision: 0,
            height: 549,
        },
    },
)
Oct 18 16:08:30.207  INFO ThreadId(01)
Oct 18 16:08:30.208 TRACE ThreadId(18) Extracted NewBlock Height { revision: 0, height: 550 }, Time(2021-10-18T14:08:20.091530Z)
Oct 18 16:08:40.267 TRACE ThreadId(18) Extracted NewBlock Height { revision: 0, height: 551 }, Time(2021-10-18T14:08:30.149446Z)
Oct 18 16:08:40.268 TRACE ThreadId(18) process_batch EventBatch { chain_id: ChainId { id: "ibc-0", version: 0 }, height: Height { revision: 0, height: 550 }, events: [NewBlock(NewBlock { height: Height { revision: 0, height: 550 } }), UpdateClient(UpdateClient { common: Attributes { height: Height { revision: 0, height: 550 }, client_id: ClientId("07-tendermint-2"), client_type: Tendermint, consensus_height: Height { revision: 1, height: 2663 } }, header: Some(Tendermint( Header {...})) })] }
Oct 18 16:08:40.268  INFO ThreadId(01) - event batch at height 0-550
Oct 18 16:08:40.268  INFO ThreadId(01) + NewBlock(
    NewBlock {
        height: Height {
            revision: 0,
            height: 550,
        },
    },
)
Oct 18 16:08:40.268  INFO ThreadId(01) + UpdateClient(
    UpdateClient {
        common: Attributes {
            height: Height {
                revision: 0,
                height: 550,
            },
            client_id: ClientId(
                "07-tendermint-2",
            ),
            client_type: Tendermint,
            consensus_height: Height {
                revision: 1,
                height: 2663,
            },
        },
        header: Some(
            Tendermint(
                 Header {...},
            ),
        ),
    },
)
Oct 18 16:08:40.268  INFO ThreadId(01)
^C
ancas-macbook-pro:ibc-rs ancaz$

Version

master

Steps to Reproduce

Start a chain and run hermes listen command. I also added these traces:

diff --git a/relayer/src/event/monitor.rs b/relayer/src/event/monitor.rs
index a19f9cb6..e9add03b 100644
--- a/relayer/src/event/monitor.rs
+++ b/relayer/src/event/monitor.rs
@@ -431,6 +431,7 @@ impl EventMonitor {

     /// Collect the IBC events from the subscriptions
     fn process_batch(&self, batch: EventBatch) -> Result<()> {
+        trace!("process_batch {:?}", batch);
         self.tx_batch
             .send(Ok(batch))
             .map_err(|_| Error::channel_send_failed())?;
diff --git a/relayer/src/event/rpc.rs b/relayer/src/event/rpc.rs
index 13c05752..f20355b1 100644
--- a/relayer/src/event/rpc.rs
+++ b/relayer/src/event/rpc.rs
@@ -19,6 +19,7 @@ pub fn get_all_events(
             );

             vals.push((height, NewBlock::new(height).into()));
+            tracing::trace!("Extracted NewBlock {:?}, {:?}", height, block.clone().unwrap().header.time);
         }
         RpcEventData::Tx { tx_result } => {
             let height = Height::new(

Acceptance Criteria

Process event batches immediately.


For Admin Use

adizere commented 2 years ago

But we should process this right away and not wait for the next event. Possibly something wrong with stream_batches()?

Isn't it the case that we need to wait for the source chain to advance to the next block height anyway, to build proofs for submitting the tx to the destination chain? So delaying the processing of these events should not slow down the packet worker in the common case.

Note: After Romain and I discussed this, we realized that there may be a tradeoff between latency versus efficiency (batching) at play here: we currently favor bigger batches by processing everything at the end. By moving to more aggressive event propagation, we will favor latency, at the expense of potentially submitting a bigger number (of smaller) transactions. Hope this makes sense! In any case, we also decided to deprioritize the work here because of (1) several unknowns, (2) unclear if the packet worker is affected, (3) it's not straightforward to reproduce.

romac commented 2 years ago

Thanks @adizere for the summary! One more thing: to do this properly, we need better metrics to track what effect this change will have, ie. number of client updates submitted, number of queries, and time to relay at the very least.

adizere commented 2 years ago

Moved to low priority & backlog until new data comes in.