hyperlane-xyz / hyperlane-monorepo

The home for Hyperlane core contracts, sdk packages, and other infrastructure
https://hyperlane.xyz
Other
294 stars 310 forks source link

Relayer startup is slow #3454

Open tkporter opened 5 months ago

tkporter commented 5 months ago

Problem

Solution

tkporter commented 5 months ago

I think part of the problem is also that the interface to rocksdb isn't async. So we block when performing rocksdb IO, and we sometimes do this in loops

From https://ryhl.io/blog/async-what-is-blocking/:

To give a sense of scale of how much time is too much, a good rule of thumb is no more than 10 to 100 microseconds between each .await. That said, this depends on the kind of application you are writing. I wonder if it makes sense to move our DB operations to a spawn_blocking closure or something?

There seem to be places where we probably block for wayyy longer than 100 microseconds, like when we call this for the first time upon startup, and it'll loop through tens of thousands of message nonces without ever hitting an .await: https://github.com/hyperlane-xyz/hyperlane-monorepo/blob/dcb67e97da6e0e3d9abd1533dc2a5ca2ff9b4617/rust/agents/relayer/src/msg/processor.rs#L119-L148

nambrot commented 4 months ago

as part of this, it might be nice to allow relayer operators to opt out of merkle tree processing

tkporter commented 4 months ago

ah that's a good idea. https://github.com/hyperlane-xyz/hyperlane-monorepo/issues/3414 is similar - we will no longer block on it, but still will do the work to eventually build the merkle tree. When we get closer to doing this we can consider the stakeholders & whether that's attractiv

yorhodes commented 4 months ago

as part of this, it might be nice to allow relayer operators to opt out of merkle tree processing

assume this means backfill processing? we still need forward fill merkle tree processing for the multisig ISMs

tkporter commented 3 months ago

chatted w/ @daniel-savu - we'll likely do this after the throughput. Plan is to:

  1. just slap the multithreaded runtime on and see what gains (if any) we get over single thread, and if we run into any weird concurrency issues like deadlocks
  2. consider what options we have when it comes to blocking rocksdb interactions - it'd be nice if we can make db interactions async in some kind of way. Seems like the most common path is to just wrap db interactions with block_in_place? Some maybe useful resources: a. https://github.com/rust-rocksdb/rust-rocksdb/issues/687 b. https://github.com/rust-rocksdb/rust-rocksdb/issues/822 c. https://github.com/fedimint/fedimint/issues/1528 d. https://rocksdb.org/blog/2022/10/07/asynchronous-io-in-rocksdb.html e. https://github.com/fedimint/fedimint/pull/1568 f. A bonus, which I thought was interesting https://www.reddit.com/r/rust/comments/10pf7m8/fish_shell_porting_to_rust_from_c/j6kxeui/?context=3
  3. if blocking rocksdb interactions seems futile, it'd still be nice to make sure that in places like I describe here https://github.com/hyperlane-xyz/hyperlane-monorepo/issues/3454#issuecomment-2010059403 that we yield frequently
daniel-savu commented 3 months ago

Instrumented tokio and was able to confirm that rocks db IO is blocking, and there isn't really anything we can do about avoiding that. The message processor tasks have almost zero idle time even after 5 mins, and merkle processors aren't doing great either:

Screenshot 2024-05-13 at 16 51 50 copy 2 Screenshot 2024-05-13 at 16 51 50 copy

Rocks db is write optimized and sync, which is essentially the opposite of what we need. Our writes happen after indexing and after confirming a submission, which are network-bound tasks themselves - the gain from having fast writes is almost zero.

On the other hand, we currently do one read for every message ever sent that passes the relayer whitelist (millions at this point). Even after parallelizing the relayer runtime, it takes 8.5 mins to start submitting to high volume chains like Optimism.

We have two DB IO bound processors per chains (message and merkle_tree), and 20 chains on the hyperlane context. This means we'd need 40 cores and growing to parallelize each chain, or shard by deploying on different machines. This is more trouble than it's worth for now.

We're opting for a simpler approach now:

avious00 commented 3 months ago

@tkporter @daniel-savu when you merge this can you ping @ltyu? syncing on sepolia was taking a long time for him, i think this addresses that

daniel-savu commented 3 months ago

@ltyu this has mostly been fixed, you can use the latest commit on main (docker image 0cf692e-20240526-164442)

daniel-savu commented 2 months ago

@tkporter reported that startup seems to be slow again. Only running with a subset of chains seems to fix this, so it's probably due to the high number of chains the omniscient relayer is currently operating. tokio-console indicates that the prepare_tasks are the issue , since they take up the most busy time of the runtime, particularly at startup. I wasn't able to narrow this down further, although I suspect that we must be doing some CPU-intensive looping in there.

3 mins into a new relayer run, line 132 (the prepare task - here) takes most of the busy time: Screenshot 2024-06-25 at 11 55 18

A view into one of the prepare task's lifecycle, showing how it takes up a lot of busy time on startup. With prepare tasks already being >20, it makes sense that some can't be scheduled because the machine doesn't have that many cores. Screenshot 2024-06-25 at 12 08 30