[Bug] Graph-node save incorrect block cache during reorgs

SozinM commented 1 year ago

Bug report

In case reorg happend and graph-node has seen it we would have 2 entries in the cache table: Example for reorg on ethereum mainnet on block 17820205 with depth=1

graph=# select hash, number, parent_hash from chain1.blocks where number=17820205 (https://etherscan.io/block/17820205/f)

                                hash                                |  number  |                            parent_hash                             
--------------------------------------------------------------------+----------+--------------------------------------------------------------------
 \xdcb02b80a71bb335a8247298cfde2af9f589b491a1eb794a88fa377fb00e6fb8 | 17820205 | \x9432a65af5ecd186859c22bf4d809561aaa4daf3bf8101fccb3f228e34b64756
 \xc0455fae763cd8d53bc294cbd32225f207dee56cb396aee84eaaa5011513731a | 17820205 | \x9432a65af5ecd186859c22bf4d809561aaa4daf3bf8101fccb3f228e34b64756
(2 rows)

graph=# select hash, number, parent_hash from chain1.blocks where number=17820206;
                                hash                                |  number  |                            parent_hash                             
--------------------------------------------------------------------+----------+--------------------------------------------------------------------
 \x409c0bb686a5fac90e706a556d02cf99122b49c683374c82341cccc836807964 | 17820206 | \xc0455fae763cd8d53bc294cbd32225f207dee56cb396aee84eaaa5011513731a
(1 row)

graph=# select hash, number, parent_hash from chain1.blocks where number=17820204;
                                hash                                |  number  |                            parent_hash                             
--------------------------------------------------------------------+----------+--------------------------------------------------------------------
 \x9432a65af5ecd186859c22bf4d809561aaa4daf3bf8101fccb3f228e34b64756 | 17820204 | \x737d8549261f685b18b6c6f2dac460c6dfc4b88bf16b03d2a813ed51f9c08c40
(1 row)

From our experience that sometimes cause errors in subgraphs, because the subgraph takes incorrect data from the cache (This need confirming) We had a problem when the subgraph was stuck until we remove duplicates from the cache.

Relevant log output

No response

IPFS hash

No response

Subgraph name or link to explorer

No response

Some information to help us out

[ ] Tick this box if this bug is caused by a regression found in the latest release.
[ ] Tick this box if this bug is specific to the hosted service.
[X] I have searched the issue tracker to make sure this issue is not a duplicate.

OS information

None

azf20 commented 1 year ago

@leoyvens do you know why this block wouldn't have been cleared from the cache when the reorg was handled?

We had a problem when the subgraph was stuck until we remove duplicates from the cache.

@SozinM can you provide any more details / logs of what you observed here?

what version of Graph Node are you running?

SozinM commented 1 year ago

Graph-node v0.31 It was some time ago, so no logs, but the problem occurred because subgraph logic was broken due to incorrect block being processed

leoyvens commented 1 year ago

Having two blocks for the same number in the block cache is expected. This causing any issues would be unexpected, but then we would need more info to debug.

SozinM commented 1 year ago

Example of the problem: we had a subgraph that failed with error:

Mapping aborted at src/mapping.ts, line 72, column 7, with message: Unexpected null fromEn\twasm backtrace:\t    0: 0x3e5a - <unknown>!src/mapping/handleTransferLINA\t in handler `handleTransferLINA` at block #28842404 (abc680fa9e4df646864070bc179885f57b7b5c7502f30f9c82eb35f481b32b7f)", "block_number": 28842404, "block_hash": "0xabc680fa9e4df646864070bc179885f57b7b5c7502f30f9c82eb35f481b32b7f"

After we made a rewind for 1000 block it's failed again on the same block. After we fixed the cache and removed duplicates + made a rewind for 1000 blocks it started to work correctly.

SozinM commented 1 year ago

Also @leoyvens could you please point me to the place in the code when block cache is used and duplicates are handled? Because I don't see any duplicates handling in here: https://github.com/graphprotocol/graph-node/blob/27cbcdd0cc21bd9a22e604bf18fd0f6b6c8dc37e/chain/ethereum/src/ethereum_adapter.rs#L1258

joehquak commented 1 year ago

Also affected from this issue specifically when using block data from the index-node for a subgraph, for example: https://github.com/Sobal/network-blocks

Reorgs are more frequent in Solana (NeonEVM uses Solana block data) so this can cause a frequent service breakdown.

We witness that indexing will seize on a reorg for the subgraph above without entering a failed state, the only resolution being to delete duplicate blocks and restart the index node

azf20 commented 11 months ago

hey @joehquak how deep can the reorgs be?

0xlucian commented 11 months ago

@azf20 32 blocks is the finalisation time for Solana

graphprotocol / graph-node