streamingfast / substreams

Powerful Blockchain streaming data engine, based on StreamingFast Firehose technology.
Apache License 2.0
159 stars 45 forks source link

Support multiple network definitions inside a Package #351

Closed abourget closed 9 months ago

abourget commented 9 months ago

Rationale

It is very cumbersome right now to specify the configuration, initial blocks and parameterization for each networks. They need to be in separate files, and we get into a combinatorial problem when we also try to do sink configurations, for each network.

With the current model, we are also forced to ship a new spkg for each network configuration, making substreams.dev's interface a bit more ugly. Ideally, we would select an spkg, which is known to work for multiple networks, and through a dropdown, for a given version, we could select which network to use dynamically.

This puts the onus on the client sending the request to pick up the configs for a given network (taken from network or as a command-line argument to override it), and apply them to the right modules. The next-level overrides will come from the command-line p flags.

Proposition

Support this syntax in the manifest:

imports:
  erc20: spkg.io/streamingfast/erc20.spkg
  src: spkg.io/streamingfast/uniswap-v3-v1.2.3.spkg  # Now upon importing, we also namespace and import the `networks` definitions.

modules:
- name: my_mapper
  initialBlock: 123123
  inputs:
  - params: string
    value: "alskdjfalskdj"

network: mainnet

networks:
  mainnet:
    initialBlocks:
      # we've got an implicit definition of
      #  src:map_pair_created: 76098098
      # from the import of `src` up there ^^ And it needs to be defined for ALL networks we're going to query here.
      module1: 123123
    params:
      module1: "address=0x123123123123"
  goerli:
    initialBlocks:
      module1: 234234
    params:
      module1: "address=9x234234234"

The deriveFrom directive would allow one to either OVERWRITE or APPEND a network configuration. Or to replace the functionality of deriveFrom, one could now use substreams (re?)pack --network polygon ./substreams.yaml which would write a new .spkg, preconfigured (with network = polygon, and all parameters and initial block in place in their Module definition), with the filename appended with -polygon.spkg (before the version? after the version?). This ought NOT TO CHANGE the package.name to be able to deduplicate once on substreams.dev ...

The run and gui commands would now support --network and apply the network configs to the Modules before sending the request over. It would also do some checks to make sure the overrides of params and initialBlocks are consistent between networks (no keys are missing for instance).

The imports manifest declaration needs to also import the networks definitions, and remap the module_name to imported_name:module_name .. so it cascades when someone selects a network from a child spkg (one that imports multiple spkgs for instance).

Validation for substreams.yaml's networks:

In the above example, if goerli was not defined within the src import, the algorithm above would halt with the error: missing initialBlock for network "goerli", on module "src:map_pair_created".


Q: Rename network to defaultNetwork or selectedNetwork? Keep it as is? preconfiguredNetwork

abourget commented 9 months ago

One could think that the init codegen could write a parameterized module, and put the contract address in the networks section, along with the initialBlock, for the selected network. This way, the scaffolded code would be ready to accomodate other networks.

YaroShkvorets commented 9 months ago

This would be great. substreams init could keep asking user for networks in a loop verifying that ABIs match and querying initialBlock, and add entries to the networks section.

abourget commented 9 months ago

Ok, we refined and updated the thing, @YaroShkvorets please re-read.

sduchesneau commented 9 months ago

After more discussions:

1) we can't validate that a) all networks have the same modules defined in them, nor b) that all the existing modules have the same values in all networks.

This is because: a) and b) I may create a substreams that supports only a subset of the networks that an imported .spkg does. I wouldn't have to define values the other networks that I don't want to serve. b) an .spkg that I import may have network-defined values for some modules, but my own modules do not depend on them -- instead, they depend only on modules that are not affected by the values in networks.

2) we have to keep ALL the networks overrides of the packages that we import, so that someone could import our spkg from another package and fill in the blanks to make it work...

3) the default values of a module for initialBlock and Params are redundant if they are overriden in Networks (with a default network set -- which will be mandatory).

This leads to the following decisions.

sduchesneau commented 9 months ago

Eventually, we could remove the client-side burden of computing all this and send the packed spkg in the request, along with the parameters like 'network' and 'params'. Maybe...

sduchesneau commented 9 months ago

https://github.com/streamingfast/substreams/pull/358 reviews welcome!

ghardin1314 commented 8 months ago

This is great and needed for a long time for us! I think one last touch is to update the schema.json file to reflect these changes. I can make a PR to update but it may be a couple days before I get back around to it

https://github.com/streamingfast/substreams/blob/develop/schemas/manifest-schema.json