[Feature] `graph-node` Substreams powered Subgraph ingestion time could be improved

maoueh commented 1 year ago

Description

When we did the RETH benchmarking a few weeks ago, I had anecdotally tested graph-node finding it was really slow. At that time, I think the batched writes was not yet fully active.

Friday, I decided to make a more formal test of how long it takes for a graph-node to sync a Substreams powered Subgraph for which the Substreams is already cached. A cached Substreams is one for which the endpoint already fully processed the .spkg and cached the output, so essentially we are simply streaming pre-processed files out on the gRPC stream.

Experience

Using https://github.com/streamingfast/substreams-eth-reth-benchmark/blob/master/reth-erc20-rocket-v1.0.6.spkg and ensuring it was cached, validated by verifying the server's cache files for the specific module ensuring all outputs files were actually present.

`substreams-sink-postgres`

Using the instructions outlined at https://github.com/streamingfast/substreams-eth-reth-benchmark#instructions, I got the following ingestion time:

Started at 2023-08-25T14:34:39.519-0400
Finished at 2023-08-25T14:41:57.111-0400
Took 7m 17s

This test ran on my laptop, a 1 Gbps connection over WiFi (I'm hitting ~100 Mbps download speed). The database is Postgres 14 ran through Docker Compose on my machine.

The endpoint used was mainnnet.eth.streamingfast.io:443.

`graph-node`

Using latest master branch (https://github.com/graphprotocol/graph-node/commit/837948a0b193c9ec75908c981984d568ae9ae160), I ran the same experiment deploying the Substreams powered Subgraph using manifest https://github.com/streamingfast/substreams-eth-reth-benchmark/blob/master/subgraph.yaml.

The endpoint configured for the provider in graph-node is pointing at mainnnet.eth.streamingfast.io:443.

Started at Aug 25 14:51:36.341
Finished at Aug 26 02:06:56.169
Tool 11h 15m 19s

`substreams-sink-rust`

This is a small Rust project that essentially extracted https://github.com/graphprotocol/graph-node/blob/master/graph/src/blockchain/substreams_block_stream.rs and endpoint.rs into a simple tutorial for people willing to consume their Substreams with Rust code.

The code can be checked at https://github.com/streamingfast/substreams-sink-rust.

It took me 8m 33s using this code to stream back the .spkg above

Observation & Discussions

I see the batching of graph-node working, but the time to accumulate the data until the batch is full really makes me wonder what takes so much time. Here for example the few lines of batching I received at the beginning of indexing (logs taken today):

Aug 28 14:18:20.372 INFO Starting subgraph writer, queue_size: 5, sgd: 1, subgraph_id: QmdjiZ...
Aug 28 14:18:20.409 INFO Data source count at start: 1, sgd: 1, subgraph_id: QmdjiZ...
Aug 28 14:18:20.419 INFO Blockstreams disconnected, connecting, provider_err_count: 0, cursor: , start_block: 11446767, ...
Aug 28 14:18:20.513 INFO Blockstreams connected, provider: substreams, deployment: QmdjiZ..., sgd: 1, subgraph_id: QmdjiZ...
Aug 28 14:23:21.004 INFO Committed write batch, time_ms: 7, weight: 240, entities: 0, block_count: 51094, block_number: 11497860, sgd: 1, subgraph_id: QmdjiZ...
Aug 28 14:28:21.276 INFO Committed write batch, time_ms: 7, weight: 240, entities: 0, block_count: 53650, block_number: 11551510, sgd: 1, subgraph_id: QmdjiZ...
...

It took roughly ~5m to receive and process ~53 650 blocks while in 5m using substreams-sink-rust I'm able to receive and process (so receive + decode) 3 584 270 blocks. I'm quite unsure where the time is spent in this 5m but it seems something is preventing full throughput here.

I think there is a big room for improvements here as I'm confident we can crunch this.

Are you aware of any blockers that must be resolved before implementing this feature? If so, which? Link to any relevant GitHub issues.

No response

Some information to help us out

[ ] Tick this box if you plan on implementing this feature yourself.
[X] I have searched the issue tracker to make sure this issue is not a duplicate.

mangas commented 1 year ago

Hey Matt, thanks for the investigation. Just to clarify that I understand correctly:

The postgres sink only writes to the cursors table in this example right? Or do i need to create other tables for it to run correctly? Which ones did you have in your test?
The rust sink consumes the messages from stream and prints them to stdout
Since both took about 7-8m in terms of consuming the stream, this issue is to try and understand why the graph-node then takes 11h to process the same data?

maoueh commented 1 year ago

The postgres sink only writes to the cursors table in this example right? Or do i need to create other tables for it to run correctly? Which ones did you have in your test?

I updated the instructions, there was indeed a setup step missing

The rust sink consumes the messages from stream and prints them to stdout

Yeah, but I just realize it does not do a Proto decode, something to try to see what's the impact of adding the Proto decoding, which could be a culprit but I doubt it personally, I could do that test if you want.

Since both took about 7-8m in terms of consuming the stream, this issue is to try and understand why the graph-node then takes 11h to process the same data?

Exact, I see batching works and really thought that would lift the majority of the time, but it seems something else might take some time. I'm not sure where, that's indeed the goal of this.

I'm expecting graph-node to take a bit more time due to historical write and block range closing, but I don't feel that it makes sense it takes soooo much time.

maoueh commented 1 year ago

Tested with using branch https://github.com/streamingfast/substreams-sink-rust/tree/experiment/decode-graph-out-entity-changes, with command (in this project & branch):

cargo run -- https://mainnet.eth.streamingfast.io:443 ../substreams-eth-reth-benchmark/reth-erc20-rocket-v1.0.6.spkg graph_out :17576926

This branch has the decoding of the EntityChanges so we pay the cost of decoding. I've processed the full range with this new code and it took 7m 41s (excluding compilation time, I was pre-compiled when I invoked the command).

maoueh commented 11 months ago

The PR has been merged which improved the situation greatly, closing.

graphprotocol / graph-node