streamingfast / substreams

Powerful Blockchain streaming data engine, based on StreamingFast Firehose technology.
Apache License 2.0
159 stars 45 forks source link

EVM RPC Poller extractor #278

Closed abourget closed 10 months ago

abourget commented 1 year ago

If adoption is one of our current goals, one way to have broader adoption would be to reel in more Layer1s faster.

One way to do that, is to realllly facilitate the integration cost for those chains.

One way to do that is by implementing a poller-sucker for EVM, based on the current JSON-RPC interfaces. That's what most of the world has done before Firehose and our rich blocks.

THE MAIN QUESTION IS:


Astar Network uses Polkadot and the Frontier pallet to provide an EVM-compatible chain on Polkadot. Moonbeam does the same. They tease EVM compatibility.

However, for them to implement our sf.ethereum.type.v2.Block, would require instrumentation that is a bit crazy: they have 90 versions of their nodes, that are deployed within a transaction mid-ways through history. So we would need to rebuild those different revisions and backported matching instrumentation (they have support to side-load an alternate version when a given hash of a revision is seen on chain).

So after reviewing their options, they decided they'd go with polling (aka the poller sucker).


Now this poses the question of backwards compatibility. Here are the three levels of data depth on Ethereum chains:

We have two options:

  1. Fit the light block data into our richer data model (sf.ethereum.type.v2.Block). In that case, a bunch of fields won't be filled, but be present when decoding.
  1. Create yet another data model, that fits LightBlockWithTraces and/or LightBlock, and insert all of the data, so the model is full and complete.

Adam highlighted another benefit for graph-node, if we have Firehose blocks on a chain, we can also save costs on RPC calls.

Easiest here is to drop-in replace in a normal block, so no tweaks are necessary in graph-node, or make graph-node aware of the new blocks.

He also thought we could augment the type.v2.Block to support all the three flavours of blocks within that namespace Block, LightBlock and LightBlockWithTraces .. and perhaps have some oneof or something, so Substreams modules could handle one or two, or three such things. Unsure if this would be less confusing, or simplify, or if we could have abstraction functions that pluck the data from wherever it is, or do those checks for the module author?!


Another noted advantage, is that the work done for EVM could be reused (like reorgs patterns) across different protocols, with different RPC methods, but as a more battle-tested way to bring Firehoses to all sorts of chains, with less involvements on their part. And slowly get deeper tracers on all chains, without imposing them at first.


Another thing that would push in the direction of backfilling the type.v2.Block is Arbitrum, which has information PRIOR to the introduction of Nitro, that could be extracted using RPC Poller methods up until the extended tracer starts to exist. So in a single chain, one would pass from a prior segment using lighter blocks, and the later segment with more data.

azf20 commented 1 year ago

Nice @abourget - I think of the two options I prefer option 1: Fit the light block data into our richer data model. This would seem to reduce duplication, simplify things for end users and create the most compatibility.

I think there is probably some worthwhile discussion on the best pattern for identifying how much data is available, but that seems like a cleaner path than creating two stacks.

abourget commented 1 year ago

We could add an enum, like

enum DataLevel {
  RICH_BLOCKS = 0;
  LOGS_WITH_TRACES = 1;
  LOGS = 2;
}

as an added field on the sf.ethereum.type.v2.Block, top-level. This way, all the current blocks would be marked as RICH_BLOCKS, and future blocks could be marked with the other two.

This way, a single Substreams module's code could adapt, even mid-chain, like it could be for Arbitrum (with Classic and Nitro eras).

matthewdarwin commented 1 year ago

LGTM (light block into existing data model + extra header to say what kind of block it is)

fubhy commented 1 year ago

I /heavily/ lean towards option #1.

The downside of this is it might create confusion in users when passing from a chain that supports light and rich blocks.

Adding the enum flag as you suggested would imho solve this.

abourget commented 11 months ago

As more work is being poured into things like SQL aggregations and more power is being pushed at that layer, I was thinking we should put a bit more pressure on this feature. We saw that multiple providers did create lots of value and a good community around just the data that is available through the RPC node (Dune, DefiLlama). In the spirit of being equal before being better, we should implement the RPC Poller extractor and allow our stack to roll out with equal data, but more powerful computation with parallelism through Substreams.

sduchesneau commented 11 months ago

Two more options:

  1. We could simply get rid of the type.v2.Block and introduce a new type.v3.Block which only has the information that can be extracted from RPC.

Because of the operational challenges of producing type.v2.Blocks and of scaling to "all EVM chains", I could see the "better and richer" type.v2.blocks being slowly discarded in favor of the "easier to produce" type.v3.blocks. While I do prefer richer v2 blocks with everything in there, the imperatives for massive adoption may push us in the other direction.

  1. If the idea is to keep the benefits of the first blocks, we could also introduce a block structure that starts with a top-level "OneOf". Ex:
    sf.ethereum.type.v2.Block {
    oneof block {
    RichBlock rich_block = 2;
    BlockWithTraces = 3;
    LightBlock = 4;
    }
    }

    The block hash, number and timestamp could be at the top level.

abourget commented 11 months ago

But both options here are a massive breaking change, for all libraries and all, aren't they?

jubeless commented 10 months ago

We are moving forwards with options 1. (Fit the light block data into our richer data model), First Draft: https://github.com/streamingfast/firehose-ethereum/pull/80