streamingfast / substreams

Powerful Blockchain streaming data engine, based on StreamingFast Firehose technology.
Apache License 2.0
159 stars 45 forks source link

Specify and implement/choose a language parser for substreams Block Filtering #409

Closed sduchesneau closed 4 months ago

sduchesneau commented 6 months ago

Substreams Block Filtering is a feature that will allow a given module of type "blockIndex" to assign arbitrary labels to certain blocks, then use the presence or absence of these labels to filter out blocks that are not of interest in another module (for performance)

example: a block has the following labels: evtsig:0x2345, evtsig:0x9876, calladdr:0xdeadbeef and calladdr:0x10101010

We need a language that has just enough expressivity to request:

  1. Only blocks that match evtsig:0x2345 OR evtsig:0x9876
  2. Only blocks that match evtsig:0x2345 AND calladdr:0xdeadbeef
  3. Only blocks that match (evtsig:0x2345 OR evtsig:0x9876) AND (calladdr:0xdeadbeef OR calladdr:0x10101010 OR ...)

We only need to express "a single array of arrays where at least one of the item of each inner array has to match a given block, for every item of the outer array"

Here's another example: (addr:dead || addr:beef) && (method:transfer || method:transferFrom)

The 'labels' could be restricted to only contain the characters matching [a-zA-Z0-9:_-] and the period (.).

maoueh commented 6 months ago

Going to port and adapted the parser/AST we had for dfuse Search Query Language.

YaroShkvorets commented 6 months ago

Cool. The plan is to pass it to firehose as a parameter for sf.firehose.v2.Stream.Blocks?

maoueh commented 6 months ago

This will be implement by Substreams directly, Firehose is not aware of this. We have no plan yet that Firehose be able to leverage indexes produced by a Substreams.

The current plan (of course up to changes) is to introduce a third type of Substreams module called index. Those would run through the block and decide what keys map to this block.

Later stage would use those "index" and this new "query expression" and only block matching the query expression would be sent to the mapper, others would be skipper.

So there would be still a need to fully process the full chain on the index module at least one. On ship, we are going to provide a few indexes "pre-populated" that people are going to leverage to filter a specific contract for example maybe.

sduchesneau commented 6 months ago
package main

import (
    "fmt"
    "testing"

    "github.com/RoaringBitmap/roaring/roaring64"
)

func TestMain(t *testing.T) {

    kv := map[string]*roaring64.Bitmap{
        "bob":      roaring64.BitmapOf(1, 2, 3),
        "alice":    roaring64.BitmapOf(1, 4, 5),
        "transfer": roaring64.BitmapOf(1, 3, 5),
    }

    // Matrix-based test cases
    testCases := []struct {
        name      string
        operation func() *roaring64.Bitmap
        result    []uint64
    }{
        {
            name: "bob || alice",
            operation: func() *roaring64.Bitmap {
                query := kv["bob"].Clone()
                query.Or(kv["alice"])
                return query
            },
            result: []uint64{1, 2, 3, 4, 5},
        },
        {
            name: "bob transfer",
            operation: func() *roaring64.Bitmap {
                query := kv["bob"].Clone()
                query.And(kv["transfer"])
                return query
            },
            result: []uint64{1, 3},
        },
        {
            name: "(alice || bob) transfer",
            operation: func() *roaring64.Bitmap {
                query := kv["alice"].Clone()
                query.Or(kv["bob"])
                query.And(kv["transfer"])
                return query
            },
            result: []uint64{1, 3, 5},
        },
    }

    // Run test cases
    for _, tc := range testCases {
        query := tc.operation()
        fmt.Println(tc.name, query.ToArray())
        fmt.Println("Expected:", tc.result)
        fmt.Println()
    }
}
sduchesneau commented 4 months ago

Shipped in https://github.com/streamingfast/substreams/releases/tag/v1.6.0