streamingfast / substreams

Powerful Blockchain streaming data engine, based on StreamingFast Firehose technology.
Apache License 2.0
160 stars 45 forks source link

Custom index Substreams module #322

Open abourget opened 1 year ago

abourget commented 1 year ago

WARN: See https://github.com/streamingfast/substreams/issues/410 which has precedence over this.

Reasons to do it:

Reasons not to do it:

Proposition for custom filter modules

that would apply the same principoles as the Firehose indexes, but in a generalized fashion.

WARN: we would need to add the blockIndex query and module reference (not the name, remember!) in the module_hash.

Summary of the existing CombinedFilter from Ethereum:

This is to provide context only.

message CombinedFilter {
  repeated LogFilter log_filters = 1;
  repeated CallToFilter call_filters = 2;
...
}
message LogFilter {
  repeated bytes addresses = 1;
  repeated bytes event_signatures = 2; // corresponds to the keccak of the event signature which is stores in topic.0
}
message CallToFilter {
  repeated bytes addresses = 1;
  repeated bytes signatures = 2;
}

This translates to this language:

(   -- log filter
  (addr:0x123 OR addr:0x234 OR addr:0x345)  -- alternatively TRUE if our list is empty
    AND
  (evsig:0x123 OR evsig:0x234 OR evsig:0x345)  -- alternatively "TRUE" if our list is empty
)
 OR 
(   -- call filter
  (to:0x123 OR to:0x234 OR to:0x345)  -- or TRUE if list is empty
    AND
  (methsig:0x234 OR methsig:0x456 OR methsig:0x678)  -- or TRUE if list is empty
)

User experience and manifest definitions

Let's say this is a publicly shared filtering package:

package:
  name: eth-filters
  version: v1.0.0

modules:
- name: events
  doc: |
      Sifts through logs and indexes keys as: addr:0x123, addr:0x234, evsig:0x456, evsig:0x567
  kind: filter
  inputs:
  - source: sf.ethereum.type.v1.Block
  output:
    type: sf.substreams.filter.v1.Keys

- name: logs_reducer
  inputs:
  - source: sf.ethereum.type.v1.Block
  output:
    type: sf.filtered.ethereum.LogsOnly

- name: reduced-events
  doc: |
      Sifts through logs and indexes keys as: addr:0x123, addr:0x234, evsig:0x456, evsig:0x567
  kind: filter
  inputs:
  - map: logs_reducer
  output:
    type: sf.substreams.filter.v1.Keys

- name: calls
  doc: |
      Sifts through calls and indexes keys as: to:0x123, to:0x234, methsig:0x456, methsig:0x567
  kind: filter
  inputs:
  - source: sf.ethereum.type.v1.Block

It would be consumed as:

blockSieve's doc: This instructions allows you only receive the inputs for blocks matching certain criteria, allowing more efficiency.

params: eth_filters:filtered_events: addrUSDC

blockSieve's doc: This instructions allows you to only process blocks matching certain criterias, avoiding the overhead of processing blocks where you know it doesn't contain what you're interested in.

imports:
  eth-filters: https://spkg.io/streamingfast/eth-filters-v1.0.0.spkg

module:
- name: fastsieve
  blockSieve:
    name: eth_sieve:contracts_and_events
    match: (addr:USDC || addr:PANCAKE) && (evsig:Transfer || evsig:TransferFrom)
  inputs:
  - source: sf.ethereum.type.v2.Block
  - map: uniswapv3:prices

- name: fastcrawl

  blockFilter:
  blockRemover:
  blockSkipper:
    name: eth-filters:events
    keepQuery: (addr:USDC || addr:PANCAKE) && (evsig:Transfer || evsig:TransferFrom)

  # This negative is shoot someone calling to do it yourself
  blockSkipper:
    name: eth-filters:events
    query: (!addr:USDC && !addr:PANCAKE) || (!evsig:Transfer && !evsig:TransferFrom)

  block:
  blockSieve:
  blockPresenceFilter:
    name: eth-filters:events
    query: (addr:123123 || addr:23123) && (evsig:123 || evsig:0x234)   # All of nothing filtering here. We don't process any inputs here if the filter says "no".
  inputs:
  - source: sf.ethereum.type.v2.Block
  - map: uniswapv3:prices

filterQueries:
  fastcrawl: (addr:123123 || addr:234234) && (evsig:123123 || evsig:0x345)
  fastcrawl: (addr:123123 || addr:234234) && (evsig:123123 || evsig:0x345)

substreams run alex-coo-stuff.spkg -filter-query=fastcrawl.eth-filters:events="(blah||)" -filter-query=fastcrawl.eth-filters:calls="(bloh||)" ....

We'd want someone to be able to express this:

Indexing higher order data with this pattern.

Someone could also index from higher order data, like Uniswap-v3 prices:

modules:
- name: univ3-prices-filter
  kind: filter
  inputs:
  - map: univ3_prices

to consume this we would:


- name: univ3-prices-filter
  type: filter
  inputs:
  - map: uniswap_v3_prices
  output: my.Prices
sduchesneau commented 7 months ago

Nice work here https://github.com/streamingfast/substreams/issues/403