Error on parallel execution with imported manifests

ghardin1314 commented 1 year ago

Running into a strange error when trying to modularize my modules into different manifests. Getting the following error when providing a --start-block and triggering parallel execution:

Error: rpc error: code = Internal desc = error building pipeline: failed setup request: parallel processing run: scheduler run: process job result for target "deployment:store_deployments": worker ended in error: receiving stream resp: rpc error: code = ResourceExhausted desc = grpc: received message larger than max (6029944 vs. 4194304)

Here is the graph with all modules in a single manifest:

graph TD;
  map_factories[map: map_factories]
  sf.ethereum.type.v2.Block[source: sf.ethereum.type.v2.Block] --> map_factories
  store_factories[store: store_factories]
  map_factories --> store_factories
  map_deployments[map: map_deployments]
  sf.ethereum.type.v2.Block[source: sf.ethereum.type.v2.Block] --> map_deployments
  store_factories --> map_deployments
  store_deployments[store: store_deployments]
  map_deployments --> store_deployments
  map_safe_setup[map: map_safe_setup]
  sf.ethereum.type.v2.Block[source: sf.ethereum.type.v2.Block] --> map_safe_setup
  store_deployments --> map_safe_setup

And here it is after separating into different manifests

graph TD;
  map_safe_setup[map: map_safe_setup]
  sf.ethereum.type.v2.Block[source: sf.ethereum.type.v2.Block] --> map_safe_setup
  deployment:store_deployments --> map_safe_setup
  deployment:map_deployments[map: deployment:map_deployments]
  sf.ethereum.type.v2.Block[source: sf.ethereum.type.v2.Block] --> deployment:map_deployments
  deployment:factory:store_factories --> deployment:map_deployments
  deployment:store_deployments[store: deployment:store_deployments]
  deployment:map_deployments --> deployment:store_deployments
  deployment:factory:map_factories[map: deployment:factory:map_factories]
  sf.ethereum.type.v2.Block[source: sf.ethereum.type.v2.Block] --> deployment:factory:map_factories
  deployment:factory:store_factories[store: deployment:factory:store_factories]
  deployment:factory:map_factories --> deployment:factory:store_factories

To be clear, it all works in single threaded execution so I don't think its a manifest/configuration problem. It also works in parallel execution when everything is contained within a single manifest.

Using the latest release binary (v0.2.0)

ghardin1314 commented 1 year ago

Note: this affects imports from both other manifest files, as well as other packaged substream modules

i.e.

imports:
  eth: https://github.com/streamingfast/sf-ethereum/releases/download/v0.10.2/ethereum-v0.10.4.spkg
  deployment: ../deployment/substreams.yaml

or

imports:
  eth: https://github.com/streamingfast/sf-ethereum/releases/download/v0.10.2/ethereum-v0.10.4.spkg
  deployment: ../deployment/masterfile-deployment-v0.1.0.spkg

both have same error

ghardin1314 commented 1 year ago

Created reproducible example here:

https://github.com/ghardin1314/substreams-issue

maoueh commented 1 year ago

I was able to reproduce. It seems some internal message exchanged internally goes over the 4MiB default limit allowed for message.

That is a relatively easy fix to perform on our side.

sduchesneau commented 1 year ago

The issue only arises when using ServiceDiscovery (XDS), so it is hard to reproduce with local server...

sduchesneau commented 1 year ago

Fixed here : https://github.com/streamingfast/dgrpc/commit/75702708cf92ed47a3c770313b2de183ddd282df bumped here: https://github.com/streamingfast/firehose-ethereum/commit/28387e2fc4d875cefab77f3fbe7d3747028226d6 and

and deployed to our polygon environment polygon.streamingfast.io. I can no longer reproduce the issue on our environment with this.

Let me know if you experience the issue again, you can always reopen this ticket. Note that the fix is not pushed on our other environments for now, but it will come. it came from our

streamingfast / substreams

Error on parallel execution with imported manifests #187