Closed maoueh closed 11 months ago
Hey Matt, thanks for the investigation. Just to clarify that I understand correctly:
The postgres sink only writes to the cursors table in this example right? Or do i need to create other tables for it to run correctly? Which ones did you have in your test?
I updated the instructions, there was indeed a setup
step missing
The rust sink consumes the messages from stream and prints them to stdout
Yeah, but I just realize it does not do a Proto decode, something to try to see what's the impact of adding the Proto decoding, which could be a culprit but I doubt it personally, I could do that test if you want.
Since both took about 7-8m in terms of consuming the stream, this issue is to try and understand why the graph-node then takes 11h to process the same data?
Exact, I see batching works and really thought that would lift the majority of the time, but it seems something else might take some time. I'm not sure where, that's indeed the goal of this.
I'm expecting graph-node
to take a bit more time due to historical write and block range closing, but I don't feel that it makes sense it takes soooo much time.
Tested with using branch https://github.com/streamingfast/substreams-sink-rust/tree/experiment/decode-graph-out-entity-changes, with command (in this project & branch):
cargo run -- https://mainnet.eth.streamingfast.io:443 ../substreams-eth-reth-benchmark/reth-erc20-rocket-v1.0.6.spkg graph_out :17576926
This branch has the decoding of the EntityChanges
so we pay the cost of decoding. I've processed the full range with this new code and it took 7m 41s (excluding compilation time, I was pre-compiled when I invoked the command).
The PR has been merged which improved the situation greatly, closing.
Description
When we did the RETH benchmarking a few weeks ago, I had anecdotally tested
graph-node
finding it was really slow. At that time, I think the batched writes was not yet fully active.Friday, I decided to make a more formal test of how long it takes for a
graph-node
to sync a Substreams powered Subgraph for which the Substreams is already cached. A cached Substreams is one for which the endpoint already fully processed the.spkg
and cached the output, so essentially we are simply streaming pre-processed files out on the gRPC stream.Experience
Using https://github.com/streamingfast/substreams-eth-reth-benchmark/blob/master/reth-erc20-rocket-v1.0.6.spkg and ensuring it was cached, validated by verifying the server's cache files for the specific module ensuring all outputs files were actually present.
substreams-sink-postgres
Using the instructions outlined at https://github.com/streamingfast/substreams-eth-reth-benchmark#instructions, I got the following ingestion time:
2023-08-25T14:34:39.519-0400
2023-08-25T14:41:57.111-0400
7m 17s
This test ran on my laptop, a 1 Gbps connection over WiFi (I'm hitting ~100 Mbps download speed). The database is Postgres 14 ran through Docker Compose on my machine.
The endpoint used was
mainnnet.eth.streamingfast.io:443
.graph-node
Using latest master branch (https://github.com/graphprotocol/graph-node/commit/837948a0b193c9ec75908c981984d568ae9ae160), I ran the same experiment deploying the Substreams powered Subgraph using manifest https://github.com/streamingfast/substreams-eth-reth-benchmark/blob/master/subgraph.yaml.
The endpoint configured for the provider in
graph-node
is pointing atmainnnet.eth.streamingfast.io:443
.Aug 25 14:51:36.341
Aug 26 02:06:56.169
11h 15m 19s
substreams-sink-rust
This is a small Rust project that essentially extracted https://github.com/graphprotocol/graph-node/blob/master/graph/src/blockchain/substreams_block_stream.rs and
endpoint.rs
into a simple tutorial for people willing to consume their Substreams with Rust code.The code can be checked at https://github.com/streamingfast/substreams-sink-rust.
It took me
8m 33s
using this code to stream back the.spkg
aboveObservation & Discussions
I see the batching of
graph-node
working, but the time to accumulate the data until the batch is full really makes me wonder what takes so much time. Here for example the few lines of batching I received at the beginning of indexing (logs taken today):It took roughly ~5m to receive and process ~53 650 blocks while in 5m using
substreams-sink-rust
I'm able to receive and process (so receive + decode) 3 584 270 blocks. I'm quite unsure where the time is spent in this 5m but it seems something is preventing full throughput here.I think there is a big room for improvements here as I'm confident we can crunch this.
Are you aware of any blockers that must be resolved before implementing this feature? If so, which? Link to any relevant GitHub issues.
No response
Some information to help us out