hyperlane-xyz / hyperlane-monorepo

The home for Hyperlane core contracts, sdk packages, and other infrastructure
https://hyperlane.xyz
Other
311 stars 342 forks source link

Scraper concurrency leads to duplicated RPC calls #2631

Open mattiekat opened 1 year ago

mattiekat commented 1 year ago

2629 is a partial fix for hyperlane-xyz/issues#493 that handles the error but not the root cause. The reason this bug started happening was because the scraper indexing logic used to be synchronous per domain and concurrent between domains. Scraping the different event types would happen one after the next for the same block range.

2246 changed this so now the scraper runs each of the different event types concurrently which means that it can and does sometimes request the same block or transaction from an RPC provider multiple times because each task does not know what the other is doing.

The specific issue is there is a race condition in the ensure_blocks and ensure_txns functions where they will query the database for the ids, and if they can't find an id, they will then go and fetch the data from an RPC provider. But when multiple indexers are running concurrently for the same domain, it is possible for them to both check the database, see it does not exist, and then go query it and find out it was already inserted. It now will just ignore the duplicates, but it still means we are making additional RPC calls.

While it is hard to say with what frequency this duplication happens, it is my theory that is is very common because if you have two indexers, lets say A and B, and A is a little faster, you would think A would get ahead, but

if A is ahead:
  A needs to make extra RPC calls AND B is able to use the data A already collected
if B is ahead:
  B needs to make extra RPC calls AND A is able to use the data B already collected
If they are approximately equal:
  A and B will duplicate RPC calls for any common blocks and transactions

Basically, the system favors the cursors being in sync.

To fix this I think we should either add some synchronization logic in the agent (we could do it in the database as well but that would probably be more work) such as a list of currently in-progress queries, or merge the tasks so that we only run one task per domain.

nambrot commented 7 months ago

I assume this still applies? @tkporter