EVM RPC Poller extractor

abourget commented 1 year ago

If adoption is one of our current goals, one way to have broader adoption would be to reel in more Layer1s faster.

One way to do that, is to realllly facilitate the integration cost for those chains.

One way to do that is by implementing a poller-sucker for EVM, based on the current JSON-RPC interfaces. That's what most of the world has done before Firehose and our rich blocks.

THE MAIN QUESTION IS:

Do we create another firehose-ethereum-light and firehose-ethereum-light-plus with different data models than firehose-ethereum ?
Or do we fold all the RPC Poller data inside the same firehose-ethereum.

Astar Network uses Polkadot and the Frontier pallet to provide an EVM-compatible chain on Polkadot. Moonbeam does the same. They tease EVM compatibility.

However, for them to implement our sf.ethereum.type.v2.Block, would require instrumentation that is a bit crazy: they have 90 versions of their nodes, that are deployed within a transaction mid-ways through history. So we would need to rebuild those different revisions and backported matching instrumentation (they have support to side-load an alternate version when a given hash of a revision is seen on chain).

So after reviewing their options, they decided they'd go with polling (aka the poller sucker).

Now this poses the question of backwards compatibility. Here are the three levels of data depth on Ethereum chains:

Level 3: Rich blocks (our full sf.ethereum.type.v2.Block), with the extended tracer or our original instrumentations.
Level 2: Light blocks with traces (standard logs and parity-style traces, available by polling or extracting from nodes that already support it)
Level 1: Light blocks without traces, basically only transactions, block headers, and log events (what you can get by polling the dumbest nodes, and are common to allllll EVM chains that can call themselves such)

We have two options:

Fit the light block data into our richer data model (sf.ethereum.type.v2.Block). In that case, a bunch of fields won't be filled, but be present when decoding.

The downside of this is it might create confusion in users when passing from a chain that supports light and rich blocks.
The good side is we need to write very little code: firehose-ethereum doesn't need to be modified, we only need a small shim that polls and outputs on stdout FIRE BLOCK with the available data (which you'd run instead of running geth underneath firehose), people developing Substreams don't need a completely different substreams.yaml and different code.. although they'll need, within their lib.rs, to detect and/or support light chains (by inspecting the blocks that are flowing through), or rely only on the data that is common between the three options, if they want to be able to use the same code on multiple chains easily.

Create yet another data model, that fits LightBlockWithTraces and/or LightBlock, and insert all of the data, so the model is full and complete.

The downside is it requires two fully new stacks, call 'em firehose-ethereum-light and ~ firehose-ethereum-light-with-traces, as our stack currently doesn't support multiple top-level blocks as source: for Substreams, nor does Firehose/merger/relayer, support multiple top-level block types. One could think of having only a firehose-ethereum-light that has traces only sometimes, but that would bring us the worst of all worlds: some complete models, and some not complete.
The good side is that, when developing a Substreams module, the protobuf model properly represents what is there, and you can expect all the data to be present, if the protocol says it supports such and such block model.

Adam highlighted another benefit for graph-node, if we have Firehose blocks on a chain, we can also save costs on RPC calls.

Easiest here is to drop-in replace in a normal block, so no tweaks are necessary in graph-node, or make graph-node aware of the new blocks.

He also thought we could augment the type.v2.Block to support all the three flavours of blocks within that namespace Block, LightBlock and LightBlockWithTraces .. and perhaps have some oneof or something, so Substreams modules could handle one or two, or three such things. Unsure if this would be less confusing, or simplify, or if we could have abstraction functions that pluck the data from wherever it is, or do those checks for the module author?!

Another noted advantage, is that the work done for EVM could be reused (like reorgs patterns) across different protocols, with different RPC methods, but as a more battle-tested way to bring Firehoses to all sorts of chains, with less involvements on their part. And slowly get deeper tracers on all chains, without imposing them at first.

Another thing that would push in the direction of backfilling the type.v2.Block is Arbitrum, which has information PRIOR to the introduction of Nitro, that could be extracted using RPC Poller methods up until the extended tracer starts to exist. So in a single chain, one would pass from a prior segment using lighter blocks, and the later segment with more data.

azf20 commented 1 year ago

Nice @abourget - I think of the two options I prefer option 1: Fit the light block data into our richer data model. This would seem to reduce duplication, simplify things for end users and create the most compatibility.

I think there is probably some worthwhile discussion on the best pattern for identifying how much data is available, but that seems like a cleaner path than creating two stacks.

abourget commented 1 year ago

We could add an enum, like

enum DataLevel {
  RICH_BLOCKS = 0;
  LOGS_WITH_TRACES = 1;
  LOGS = 2;
}

as an added field on the sf.ethereum.type.v2.Block, top-level. This way, all the current blocks would be marked as RICH_BLOCKS, and future blocks could be marked with the other two.

This way, a single Substreams module's code could adapt, even mid-chain, like it could be for Arbitrum (with Classic and Nitro eras).

matthewdarwin commented 1 year ago

LGTM (light block into existing data model + extra header to say what kind of block it is)

fubhy commented 1 year ago

I /heavily/ lean towards option #1.

The downside of this is it might create confusion in users when passing from a chain that supports light and rich blocks.

Adding the enum flag as you suggested would imho solve this.

abourget commented 1 year ago

As more work is being poured into things like SQL aggregations and more power is being pushed at that layer, I was thinking we should put a bit more pressure on this feature. We saw that multiple providers did create lots of value and a good community around just the data that is available through the RPC node (Dune, DefiLlama). In the spirit of being equal before being better, we should implement the RPC Poller extractor and allow our stack to roll out with equal data, but more powerful computation with parallelism through Substreams.

sduchesneau commented 1 year ago

Two more options:

We could simply get rid of the type.v2.Block and introduce a new type.v3.Block which only has the information that can be extracted from RPC.

Because of the operational challenges of producing type.v2.Blocks and of scaling to "all EVM chains", I could see the "better and richer" type.v2.blocks being slowly discarded in favor of the "easier to produce" type.v3.blocks. While I do prefer richer v2 blocks with everything in there, the imperatives for massive adoption may push us in the other direction.

If the idea is to keep the benefits of the first blocks, we could also introduce a block structure that starts with a top-level "OneOf". Ex:
```
sf.ethereum.type.v2.Block {
oneof block {
RichBlock rich_block = 2;
BlockWithTraces = 3;
LightBlock = 4;
}
}
```
The block hash, number and timestamp could be at the top level.

abourget commented 1 year ago

But both options here are a massive breaking change, for all libraries and all, aren't they?

jubeless commented 1 year ago

We are moving forwards with options 1. (Fit the light block data into our richer data model), First Draft: https://github.com/streamingfast/firehose-ethereum/pull/80

streamingfast / substreams

EVM RPC Poller extractor #278