Open matthewdarwin opened 1 year ago
Probably missing some error to be flagged as retryable here https://github.com/streamingfast/substreams/blob/develop/orchestrator/work/worker.go#L100-L113, retryable errors are flagged within https://github.com/streamingfast/substreams/blob/develop/orchestrator/work/worker.go#L139
The same behaviour happens when I try sending > 1024 concurrent requests to envoy via substreams-tier1-max-subrequests: 2000
. Envoy by default has max_connections set to 1024, so establishing more than that is an error and tier1 doesn't handle it nicely.
substreams-tier1 does not handle well the case that tier2 runs out of RAM an crashes.
The tier2 container runs out of RAM and crashes and then tier1 reports
and then all the tier2 jobs get cancelled.
For this test there are 26 containers, one container running out of RAM causes the jobs on all the other 25 containers to exit as well. This wastes a lot of resources as things get re-dispatched again.
I'm using
substreams-sink-noop
on EOS (Antelope) for testing.