graphprotocol / graph-node

Graph Node indexes data from blockchains such as Ethereum and serves it over GraphQL
https://thegraph.com
Apache License 2.0
2.9k stars 967 forks source link

[Feature] Configuration flag to allow possible 'null' blocks on a network #4729

Open emmanuelm41 opened 1 year ago

emmanuelm41 commented 1 year ago

Description

Filecoin blockchain has added an EVM runtime on top of its blockchain some months ago. From Filecoin docs web:

"The FVM contains an Ethereum Virtual Machine (EVM) runtime, allowing Ethereum and Solidity developers to run their contracts on the FVM with little to no modifications. This page details what exactly this EVM compatibility means, and any other information that Ethereum developers may need to build applications on the FVM."

Because of the Filecoin's miner election process nature, "null" blocks can happen. When the random miner election process occurs, sometimes no winer is chosen. In that scenario, no tipset (group of blocks) is generated. On the ethereum rpc side, this is reflected with an error message indicating the block is null. Please, refer to Ethereum RPC doc.

We are currently trying to use the ethereum rpc on the Filecoin network to connect a graph node so that subgraphs can be deployed. However, the moment a null block is found, the process gets stuck there. Some possible fixes has been explored, but none of them have been effective. In particular, a proxy has been established between the graph node and the filecoin node to produce fake blocks when a null one is found. This an entire valid block with no txs on it. However, the chain is actually broken, as block hashes are not correctly chained (it is impossible to do it in this proxy). For this reason, we see constantly reverted blocks while new blocks are ingested.

One possible workaround to this could be a new flag for specific chains which can allow to "skip" blocks when they are "null".

Some screenshots from our grafana dashboard image

These are some of the logs we see on the node

Jun 29 13:56:14.343 INFO Scanning blocks [2991101, 2991101], range_size: 2000, sgd: 2, subgraph_id: QmebGEv1gdcQfGM5RkiDbx421Kw4JEPTMmUvN8wc6NU3XL, component: BlockStream
Jun 29 13:56:14.343 DEBG Requesting hashes for blocks [2991101, 2991101], sgd: 2, subgraph_id: QmebGEv1gdcQfGM5RkiDbx421Kw4JEPTMmUvN8wc6NU3XL, component: BlockStream
Jun 29 13:56:14.356 DEBG Requesting 0 block(s), sgd: 5, subgraph_id: QmQzU2mwuhMZvSXam1jPcWhBMpnyFBL7NudkmJkyduQ3F5, component: BlockStream
Jun 29 13:56:14.405 INFO Reverting block to get back to main chain, revert_to_ptr: #2991100 (6718cfaf170b5e8e1e67bec45f07d08768f36d2ee420a4e2d7d6acdcf615f8d0), subgraph_ptr: #2991101 (fcf0abfb1c450019f8305472c962a8ed445d4804d727ca5f5e4adfa6c4f52979), sgd: 5, subgraph_id: QmQzU2mwuhMZvSXam1jPcWhBMpnyFBL7NudkmJkyduQ3F5, component: SubgraphInstanceManager
Jun 29 13:56:14.746 DEBG Found 1 relevant block(s), sgd: 2, subgraph_id: QmebGEv1gdcQfGM5RkiDbx421Kw4JEPTMmUvN8wc6NU3XL, component: BlockStream
Jun 29 13:56:14.768 INFO Scanning blocks [2991101, 2991101], range_size: 2000, sgd: 5, subgraph_id: QmQzU2mwuhMZvSXam1jPcWhBMpnyFBL7NudkmJkyduQ3F5, component: BlockStream
Jun 29 13:56:14.770 DEBG Requesting logs for blocks [2991101, 2991101], contract 0x9be9b0cfa89ea800556c6efba67b455d336db1d0, 5 events, sgd: 5, subgraph_id: QmQzU2mwuhMZvSXam1jPcWhBMpnyFBL7NudkmJkyduQ3F5, component: BlockStream
Jun 29 13:56:14.835 INFO Reverting block to get back to main chain, revert_to_ptr: #2991100 (6718cfaf170b5e8e1e67bec45f07d08768f36d2ee420a4e2d7d6acdcf615f8d0), subgraph_ptr: #2991101 (fcf0abfb1c450019f8305472c962a8ed445d4804d727ca5f5e4adfa6c4f52979), sgd: 2, subgraph_id: QmebGEv1gdcQfGM5RkiDbx421Kw4JEPTMmUvN8wc6NU3XL, component: SubgraphInstanceManagerJu

Are you aware of any blockers that must be resolved before implementing this feature? If so, which? Link to any relevant GitHub issues.

No response

Some information to help us out

fridrik01 commented 1 year ago

I collaborated with @emmanuelm41 on this issue and want to share that in addition to the issue of constantly reverted blocks we also notice that the graph node falls behind about 200 blocks from the latest block, so it does not seem to keep up due to always reverting.

Also, when querying the graph node in GraphQL we very often see an error message "the chain was reorganized while executing the query" which further indicates that this is a revert issue:

image
SozinM commented 1 year ago

Also, erigon clients for Polygon and BSC have kinda same problems. Some blocks (state transactions) do not return transactions and it causes the graph to treat it as reorg and wait until it gives up. I think it would be perfect to refactor the graph reorg detection logic to make it more strict so it triggers only on real reorgs.

maoueh commented 1 year ago

@SozinM The graph-node logic for revert is done on two places. One when we are far beyond chain head block, in which case a call to eth_getBlockByNumber is made and we check if the block's hash is the same as the one returned by the RPC call (https://github.com/graphprotocol/graph-node/blob/master/chain/ethereum/src/ethereum_adapter.rs#L648 via https://github.com/graphprotocol/graph-node/blob/master/graph/src/blockchain/polling_block_stream.rs#L298-L312)

When we are live, we check the continuity of the chain by checking if current head block's parent hash is the same as the previous block we have seen: https://github.com/graphprotocol/graph-node/blob/master/graph/src/blockchain/polling_block_stream.rs#L430-L450 (the else part).

I don't think the issue description applies to your use case @SozinM if only transactions are missing, it shouldn't cause weird re-org issue. Of course if the block is fully empty, that's another thing.

leoyvens commented 1 year ago

At high-level, two approaches would be possible here:

  1. Have Graph Node be able to handle gaps in an EVM chain.
  2. Fill in the gaps with a dummy block.

Approach 2 is what you attempted with the proxy, but it runs into the issue which is that if block N is a dummy between blocks, then what should the parent hash of N + 1 be? In some situations we'd want it to be the hash of N - 1 as the chain says, and in others the hash of N to keep a continuous chain. I don't see a good way to solve this problem which leads me to think we should explore approach 1.

There is a precedent in Graph Node for a chain which skips blocks, which is Near. In Near, block headers have both a prev_hash and prev_height. Graph Node uses this in fn parent_ptr, which looks like this for Near: https://github.com/graphprotocol/graph-node/blob/1535e247d79e02d4999e13ab2a65d9eebac0fa71/chain/near/src/codec.rs#L28-L33

Contrast it to Ethereum, which assumes the parent block number is n - 1, as is the case in all other EVM compatible chains we support: https://github.com/graphprotocol/graph-node/blob/1535e247d79e02d4999e13ab2a65d9eebac0fa71/graph/src/components/ethereum/types.rs#L38-L43

We'd have to support a prev_height for EVM chains. This support would be opt-in, enabled in the toml config. Adding that extra field to the block will be a bit annoying, as the Ethereum block type is currently taken straight from the web3 crate, we'd need to change it from a type alias to a struct: https://github.com/graphprotocol/graph-node/blob/1535e247d79e02d4999e13ab2a65d9eebac0fa71/graph/src/components/ethereum/types.rs#L10

A difference between Near and Ethereum is that Near is Firehose-only, so it uses a Firehose block ingestor. This will use an RPC block ingestor, which lives in chain/ethereum/src/ingestor.rs. That will need to be modified to calculate and add the prev_height information to blocks, and to ignore null as a skipped (in chains that opt-in).

This would be my suggestion at this point, hopefully it helps and I'm available to further discuss the approach or answer any questions about the code details.

raulk commented 1 year ago

I'm going to try and stake a stab at this.

eshon commented 1 year ago

In case anyone needs to do testing, there is a null Filecoin Tipset round at height #2991095. Zondax's original logs above seem to describe an error at height #2991101 but that round has blocks/messages. The null round occurred some Tipsets before.

xianglinhe commented 10 months ago

I'm going to try and stake a stab at this.

Hi raulk, is there any update of your test?

dboreham commented 7 months ago

Hello @xianglinhe here is a set of changes that we have tested against a live Lotus node handling the "null blocks present" case: https://github.com/vulcanize/graph-node/pull/1

azf20 commented 7 months ago

Hey @dboreham, it would be great to upstream these changes, could you open a PR?

AFDudley commented 7 months ago

https://github.com/graphprotocol/graph-node/pull/5247