paradigmxyz / reth

Modular, contributor-friendly and blazing-fast implementation of the Ethereum protocol, in Rust
https://reth.rs/
Apache License 2.0
3.99k stars 1.22k forks source link

new engine has issue downloading block #12653

Closed yutianwu closed 1 day ago

yutianwu commented 3 days ago

Describe the bug

Hi, we recently received feedback from the community regarding an issue where two blocks are being added to the chain simultaneously during synchronization.

Image

As shown in the image above, when a user subscribes to headers for BSC, two blocks are sent simultaneously.

So I added logs to the code to find out the cause:

2024-11-19T11:31:13.120522Z  INFO Send download block request: 0x0e19996504a7dbc4da03aac008abb9c0e787ccc69284a96be8bd491ade69e243
2024-11-19T11:31:16.107017Z  INFO Received single full block, buffering block=NumHash { number: 44145043, hash: 0x0e19996504a7dbc4da03aac008abb9c0e787ccc69284a96be8bd491ade69e243 }
2024-11-19T11:31:16.108570Z  INFO Send download block request: 0x5513a8bb7731018d5e197492e7374353340dc2269094c907229a55c37382a8f2
2024-11-19T11:31:16.108626Z  INFO Received forkchoice updated message when syncing head_block_hash=0x5513a8bb7731018d5e197492e7374353340dc2269094c907229a55c37382a8f2 safe_block_hash=0x2f5506d5e29909a0776cdbdf573cafeaffc789c3a7369fc9ef8d239daa12bd69 finalized_block_hash=0x834bb65dc92c07aedd83ff0bfeb331346c3dbeb42845ac463733c5adca8aed26
2024-11-19T11:31:16.109018Z  INFO Received single full block, buffering block=NumHash { number: 44145044, hash: 0x5513a8bb7731018d5e197492e7374353340dc2269094c907229a55c37382a8f2 }
2024-11-19T11:31:16.289155Z  INFO Block added to canonical chain number=44145043 hash=0x0e19996504a7dbc4da03aac008abb9c0e787ccc69284a96be8bd491ade69e243 peers=11 txs=137 gas=17.90 Mgas gas_throughput=99.22 Mgas/second full=12.8% base_fee=0.00gwei blobs=0 excess_blobs=0 elapsed=180.412479ms
2024-11-19T11:31:16.459900Z  INFO Block added to canonical chain number=44145044 hash=0x5513a8bb7731018d5e197492e7374353340dc2269094c907229a55c37382a8f2 peers=11 txs=188 gas=13.41 Mgas gas_throughput=78.62 Mgas/second full=9.6% base_fee=0.00gwei blobs=0 excess_blobs=0 elapsed=170.58106ms
2024-11-19T11:31:16.460665Z  INFO Canonical chain committed number=44145044 hash=0x5513a8bb7731018d5e197492e7374353340dc2269094c907229a55c37382a8f2 elapsed=669.425µs

From the log above, we can see that the request was sent at 11:31:13, but the block was received at 11:31:16.

I suspect the issue might be related to the poll logic. It seems the block might have been downloaded but not processed properly by the engine. Have you encountered a similar issue before, or could you provide some insights on this? Thanks!

Steps to reproduce

You can refer to the above log

Node logs

Platform(s)

Linux (ARM)

What version/commit are you on?

v1.1.0

What database version are you on?

v1.1.0

Which chain / network are you on?

BSC

What type of node are you running?

Archive (default)

What prune config do you use, if any?

No response

If you've built Reth from source, provide the full command you used

No response

Code of Conduct

mattsse commented 3 days ago

As shown in the image above, when a user subscribes to headers for BSC, two blocks are sent simultaneously.

no idea what this represents, please elaborate.

yutianwu commented 3 days ago

As shown in the image above, when a user subscribes to headers for BSC, two blocks are sent simultaneously.

no idea what this represents, please elaborate.

hi, you can just check the log:

2024-11-19T11:31:13.120522Z  INFO Send download block request: 0x0e19996504a7dbc4da03aac008abb9c0e787ccc69284a96be8bd491ade69e243
2024-11-19T11:31:16.107017Z  INFO Received single full block, buffering block=NumHash { number: 44145043, hash: 0x0e19996504a7dbc4da03aac008abb9c0e787ccc69284a96be8bd491ade69e243 }
2024-11-19T11:31:16.108570Z  INFO Send download block request: 0x5513a8bb7731018d5e197492e7374353340dc2269094c907229a55c37382a8f2
2024-11-19T11:31:16.108626Z  INFO Received forkchoice updated message when syncing head_block_hash=0x5513a8bb7731018d5e197492e7374353340dc2269094c907229a55c37382a8f2 safe_block_hash=0x2f5506d5e29909a0776cdbdf573cafeaffc789c3a7369fc9ef8d239daa12bd69 finalized_block_hash=0x834bb65dc92c07aedd83ff0bfeb331346c3dbeb42845ac463733c5adca8aed26
2024-11-19T11:31:16.109018Z  INFO Received single full block, buffering block=NumHash { number: 44145044, hash: 0x5513a8bb7731018d5e197492e7374353340dc2269094c907229a55c37382a8f2 }
2024-11-19T11:31:16.289155Z  INFO Block added to canonical chain number=44145043 hash=0x0e19996504a7dbc4da03aac008abb9c0e787ccc69284a96be8bd491ade69e243 peers=11 txs=137 gas=17.90 Mgas gas_throughput=99.22 Mgas/second full=12.8% base_fee=0.00gwei blobs=0 excess_blobs=0 elapsed=180.412479ms
2024-11-19T11:31:16.459900Z  INFO Block added to canonical chain number=44145044 hash=0x5513a8bb7731018d5e197492e7374353340dc2269094c907229a55c37382a8f2 peers=11 txs=188 gas=13.41 Mgas gas_throughput=78.62 Mgas/second full=9.6% base_fee=0.00gwei blobs=0 excess_blobs=0 elapsed=170.58106ms
2024-11-19T11:31:16.460665Z  INFO Canonical chain committed number=44145044 hash=0x5513a8bb7731018d5e197492e7374353340dc2269094c907229a55c37382a8f2 elapsed=669.425µs

I add log Send download block request in the below line:

https://github.com/paradigmxyz/reth/blob/1ba631ba9581973e7c6cadeea92cfe1802aceb4a/crates/engine/tree/src/download.rs#L108

and Received single full block in the below line: https://github.com/paradigmxyz/reth/blob/1ba631ba9581973e7c6cadeea92cfe1802aceb4a/crates/engine/tree/src/download.rs#L205

You can find out that the request was sent at 11:31:13, but the block was received at 11:31:16. It should not be the case that it takes 3 seconds to get the block. It may be caused by the poll mechanism.

mattsse commented 3 days ago

sorry, you have to be a bit more specific about what exactly you're measuring

should not be the case that it takes 3 seconds to get the block

get the block how/where?

still unclear what the image represents

mattsse commented 3 days ago

the event with the new chain is only emitted here

2024-11-19T11:31:16.460665Z INFO Canonical chain committed number=44145044 hash=0x5513a8bb7731018d5e197492e7374353340dc2269094c907229a55c37382a8f2 elapsed=669.425µs

yutianwu commented 3 days ago

Hi, in the log above, we processed two blocks: 44145043 and 44145044.

  1. Block 44145043 should have been added at 11:31:13.
  2. Block 44145044 should have been added at 11:31:16.

However, the actual scenario was different. We sent the request for block 44145043 at 11:31:13 but only received it at 11:31:16, at which point it was added to the chain. Ideally, the block should have been received and processed at 11:31:13. I suspect the delay is related to the poll mechanism, as it shouldn't take 3 seconds to retrieve a block if everything is functioning correctly.

For block 44145044, the request was sent and the block was received at 11:31:16. This caused both blocks to be added to the chain simultaneously, as seen in the log.

2024-11-19T11:31:16.289155Z  INFO Block added to canonical chain number=44145043 hash=0x0e19996504a7dbc4da03aac008abb9c0e787ccc69284a96be8bd491ade69e243 peers=11 txs=137 gas=17.90 Mgas gas_throughput=99.22 Mgas/second full=12.8% base_fee=0.00gwei blobs=0 excess_blobs=0 elapsed=180.412479ms
2024-11-19T11:31:16.459900Z  INFO Block added to canonical chain number=44145044 hash=0x5513a8bb7731018d5e197492e7374353340dc2269094c907229a55c37382a8f2 peers=11 txs=188 gas=13.41 Mgas gas_throughput=78.62 Mgas/second full=9.6% base_fee=0.00gwei blobs=0 excess_blobs=0 elapsed=170.58106ms

So if the user subscribes to headers via ws, the user will receive two blocks at the same time. But block 44145043 should be received at 11:31:13 not 11:31:16.

Hope I make it clearer, thanks for your help ~

the event with the new chain is only emitted here

2024-11-19T11:31:16.460665Z INFO Canonical chain committed number=44145044 hash=0x5513a8bb7731018d5e197492e7374353340dc2269094c907229a55c37382a8f2 elapsed=669.425µs

mattsse commented 3 days ago

okay, I think I've finally understood what the issue is.

this looks indeed pretty consistent.

if you have a way to reproduce this I'd appreciate if you could sprinkle a few more traces at various places in the poll loop

https://github.com/paradigmxyz/reth/blob/0db10a13a53bf477777f4af7fba2bed805c915cf/crates/engine/tree/src/engine.rs#L82-L82

like enter, pending and

https://github.com/paradigmxyz/reth/blob/0db10a13a53bf477777f4af7fba2bed805c915cf/crates/engine/tree/src/engine.rs#L116-L116 https://github.com/paradigmxyz/reth/blob/0db10a13a53bf477777f4af7fba2bed805c915cf/crates/engine/tree/src/engine.rs#L116-L116

logs from the net target would also help

yutianwu commented 2 days ago

Hi, I've added more logs as you mentioned above, including a log after the following line in ChainOrchestrator:

https://github.com/paradigmxyz/reth/blob/0db10a13a53bf477777f4af7fba2bed805c915cf/crates/engine/tree/src/chain.rs#L75

Here's an explanation of the logs:

After sending the block request, the ChainOrchestrator was polled only after 3 seconds. I'm unsure how we can ensure the downloaded block is received promptly via the polling mechanism. The code contains many nested polls in polls, and I may not fully understand all the details. Perhaps there's something I missed that guarantees timely receipt of the requested block. Could you provide some suggestions or guidance on this? Thanks!

1. 2024-11-20T07:27:20.986684Z  INFO Canonical chain committed number=45788833 hash=0xdb6ff25d881492ce127c72b2c6272f5de91a03770eb3463335935f24b772fd6b elapsed=55.836µs
2. 2024-11-20T07:27:23.221349Z  INFO ChainOrchestrator: enter poll next event
3. 2024-11-20T07:27:23.221380Z  INFO ChainOrchestrator: enter poll next event and poll the handler
4. 2024-11-20T07:27:23.221390Z  INFO ChainHandler: enter poll
5. 2024-11-20T07:27:23.221396Z  INFO ChainHandler: start the poll loop
6. 2024-11-20T07:27:23.221413Z  INFO ChainHandler: start the poll loop
7. 2024-11-20T07:27:23.221429Z  INFO ChainHandler: enter the poll loop and before the downloader poll
8. 2024-11-20T07:27:23.221442Z  INFO ChainHandler: poll return pending
9. 2024-11-20T07:27:23.221983Z  INFO ChainOrchestrator: enter poll next event
10. 2024-11-20T07:27:23.222003Z  INFO ChainOrchestrator: enter poll next event and poll the handler
11. 2024-11-20T07:27:23.222011Z  INFO ChainHandler: enter poll
12. 2024-11-20T07:27:23.222018Z  INFO ChainHandler: start the poll loop
13. 2024-11-20T07:27:23.222036Z  INFO ChainOrchestrator: enter poll next event
14. 2024-11-20T07:27:23.222054Z  INFO ChainOrchestrator: enter poll next event and poll the handler
15. 2024-11-20T07:27:23.222067Z  INFO ChainHandler: enter poll
16. 2024-11-20T07:27:23.222085Z  INFO ChainHandler: start the poll loop
17. 2024-11-20T07:27:23.222096Z  INFO BasicBlockDownloader: Send download block request: 0xb147f7b7759e02fd97895259651a5039f2d12fc99462f781cc46c94faec9eb37
18. 2024-11-20T07:27:23.222105Z  INFO Start downloading full block hash=0xb147f7b7759e02fd97895259651a5039f2d12fc99462f781cc46c94faec9eb37
19. 2024-11-20T07:27:23.222115Z  INFO ChainHandler: enter the poll loop and before the downloader poll
20. 2024-11-20T07:27:23.222128Z  INFO ChainHandler: poll return pending
21. 2024-11-20T07:27:26.341360Z  INFO ChainOrchestrator: enter poll next event
22. 2024-11-20T07:27:26.341394Z  INFO ChainOrchestrator: enter poll next event and poll the handler
23. 2024-11-20T07:27:26.341404Z  INFO ChainHandler: enter poll
24. 2024-11-20T07:27:26.341412Z  INFO ChainHandler: start the poll loop
25. 2024-11-20T07:27:26.341429Z  INFO ChainHandler: start the poll loop
26. 2024-11-20T07:27:26.341447Z  INFO ChainHandler: enter the poll loop and before the downloader poll
27. 2024-11-20T07:27:26.341499Z  INFO ChainHandler: poll return pending
28. 2024-11-20T07:27:26.342611Z  INFO ChainOrchestrator: enter poll next event
29. 2024-11-20T07:27:26.342643Z  INFO ChainOrchestrator: enter poll next event and poll the handler
30. 2024-11-20T07:27:26.342657Z  INFO ChainHandler: enter poll
31. 2024-11-20T07:27:26.342668Z  INFO ChainHandler: start the poll loop
32. 2024-11-20T07:27:26.342689Z  INFO ChainOrchestrator: enter poll next event
33. 2024-11-20T07:27:26.342707Z  INFO ChainOrchestrator: enter poll next event and poll the handler
34. 2024-11-20T07:27:26.342719Z  INFO ChainHandler: enter poll
35. 2024-11-20T07:27:26.342727Z  INFO ChainHandler: start the poll loop
36. 2024-11-20T07:27:26.342734Z  INFO BasicBlockDownloader: Send download block request: 0x4eb334d5809f9e239538ea068764c27d859a162a777df69471b982a354e476b3
37. 2024-11-20T07:27:26.342750Z  INFO Start downloading full block hash=0x4eb334d5809f9e239538ea068764c27d859a162a777df69471b982a354e476b3
38. 2024-11-20T07:27:26.342773Z  INFO ChainHandler: enter the poll loop and before the downloader poll
39. 2024-11-20T07:27:26.342782Z  INFO ChainHandler: poll return pending
40. 2024-11-20T07:27:26.342798Z  INFO Received forkchoice updated message when syncing head_block_hash=0x4eb334d5809f9e239538ea068764c27d859a162a777df69471b982a354e476b3 safe_block_hash=0x38fd309250c26d99eb57ca13f9bb164e06777ea5a17a6f9fec907c2bf8f15432 finalized_block_hash=0xc5a1ca61d53be761cbe4ef152d429ff10c76773279c6326e29eb57ce06605d43
41. 2024-11-20T07:27:26.409311Z  INFO ChainOrchestrator: enter poll next event
42. 2024-11-20T07:27:26.409350Z  INFO ChainOrchestrator: enter poll next event and poll the handler
43. 2024-11-20T07:27:26.409359Z  INFO ChainHandler: enter poll
44. 2024-11-20T07:27:26.409366Z  INFO ChainHandler: start the poll loop
45. 2024-11-20T07:27:26.409374Z  INFO ChainHandler: enter the poll loop and before the downloader poll
46. 2024-11-20T07:27:26.409483Z  INFO Received single full block, buffering block=NumHash { number: 45788834, hash: 0xb147f7b7759e02fd97895259651a5039f2d12fc99462f781cc46c94faec9eb37 }
47. 2024-11-20T07:27:26.410311Z  INFO ChainHandler: enter the downloader poll and downloader poll is ready
48. 2024-11-20T07:27:26.410331Z  INFO ChainHandler: Downloaded block: 0xb147f7b7759e02fd97895259651a5039f2d12fc99462f781cc46c94faec9eb37
49. 2024-11-20T07:27:26.410355Z  INFO ChainHandler: start the poll loop
50. 2024-11-20T07:27:26.410375Z  INFO ChainHandler: enter the poll loop and before the downloader poll
51. 2024-11-20T07:27:26.410390Z  INFO ChainHandler: poll return pending
52. 2024-11-20T07:27:26.429184Z  INFO ChainOrchestrator: enter poll next event
53. 2024-11-20T07:27:26.429216Z  INFO ChainOrchestrator: enter poll next event and poll the handler
54. 2024-11-20T07:27:26.429225Z  INFO ChainHandler: enter poll
55. 2024-11-20T07:27:26.429232Z  INFO ChainHandler: start the poll loop
56. 2024-11-20T07:27:26.429260Z  INFO ChainOrchestrator: enter poll next event
57. 2024-11-20T07:27:26.429279Z  INFO ChainOrchestrator: enter poll next event and poll the handler
58. 2024-11-20T07:27:26.429291Z  INFO ChainHandler: enter poll
59. 2024-11-20T07:27:26.429298Z  INFO ChainHandler: start the poll loop
60. 2024-11-20T07:27:26.429313Z  INFO ChainHandler: enter the poll loop and before the downloader poll
61. 2024-11-20T07:27:26.429346Z  INFO ChainHandler: poll return pending
62. 2024-11-20T07:27:26.429372Z  INFO Block added to canonical chain number=45788834 hash=0xb147f7b7759e02fd97895259651a5039f2d12fc99462f781cc46c94faec9eb37 peers=3 txs=5 gas=931.80 Kgas gas_throughput=49.79 Mgas/second full=0.9% base_fee=0.00gwei blobs=0 excess_blobs=0 elapsed=18.715478ms
63. 2024-11-20T07:27:26.478610Z  INFO ChainOrchestrator: enter poll next event
64. 2024-11-20T07:27:26.478648Z  INFO ChainOrchestrator: enter poll next event and poll the handler
65. 2024-11-20T07:27:26.478658Z  INFO ChainHandler: enter poll
66. 2024-11-20T07:27:26.478665Z  INFO ChainHandler: start the poll loop
67. 2024-11-20T07:27:26.478673Z  INFO ChainHandler: enter the poll loop and before the downloader poll
68. 2024-11-20T07:27:26.478727Z  INFO Received single full block, buffering block=NumHash { number: 45788835, hash: 0x4eb334d5809f9e239538ea068764c27d859a162a777df69471b982a354e476b3 }
69. 2024-11-20T07:27:26.479999Z  INFO ChainHandler: enter the downloader poll and downloader poll is ready
70. 2024-11-20T07:27:26.480018Z  INFO ChainHandler: Downloaded block: 0x4eb334d5809f9e239538ea068764c27d859a162a777df69471b982a354e476b3
71. 2024-11-20T07:27:26.480033Z  INFO ChainHandler: start the poll loop
72. 2024-11-20T07:27:26.480043Z  INFO ChainHandler: enter the poll loop and before the downloader poll
73. 2024-11-20T07:27:26.480051Z  INFO ChainHandler: poll return pending
74. 2024-11-20T07:27:26.503252Z  INFO ChainOrchestrator: enter poll next event
75. 2024-11-20T07:27:26.503287Z  INFO ChainOrchestrator: enter poll next event and poll the handler
76. 2024-11-20T07:27:26.503297Z  INFO ChainHandler: enter poll
77. 2024-11-20T07:27:26.503304Z  INFO ChainHandler: start the poll loop
78. 2024-11-20T07:27:26.503333Z  INFO ChainOrchestrator: enter poll next event
79. 2024-11-20T07:27:26.503345Z  INFO ChainOrchestrator: enter poll next event and poll the handler
80. 2024-11-20T07:27:26.503353Z  INFO ChainHandler: enter poll
81. 2024-11-20T07:27:26.503363Z  INFO ChainHandler: start the poll loop
82. 2024-11-20T07:27:26.503378Z  INFO ChainOrchestrator: enter poll next event
83. 2024-11-20T07:27:26.503385Z  INFO ChainOrchestrator: enter poll next event and poll the handler
84. 2024-11-20T07:27:26.503391Z  INFO ChainHandler: enter poll
85. 2024-11-20T07:27:26.503396Z  INFO ChainHandler: start the poll loop
86. 2024-11-20T07:27:26.503403Z  INFO ChainHandler: enter the poll loop and before the downloader poll
87. 2024-11-20T07:27:26.503400Z  INFO Block added to canonical chain number=45788835 hash=0x4eb334d5809f9e239538ea068764c27d859a162a777df69471b982a354e476b3 peers=3 txs=8 gas=584.22 Kgas gas_throughput=25.45 Mgas/second full=0.6% base_fee=0.00gwei blobs=1 excess_blobs=0 elapsed=22.954039ms
88. 2024-11-20T07:27:26.503412Z  INFO ChainHandler: poll return pending
89. 2024-11-20T07:27:26.503430Z  INFO Canonical chain committed number=45788835 hash=0x4eb334d5809f9e239538ea068764c27d859a162a777df69471b982a354e476b3 elapsed=76.487µs

okay, I think I've finally understood what the issue is.

this looks indeed pretty consistent.

if you have a way to reproduce this I'd appreciate if you could sprinkle a few more traces at various places in the poll loop

reth/crates/engine/tree/src/engine.rs

Line 82 in 0db10a1

loop { like enter, pending and

reth/crates/engine/tree/src/engine.rs

Line 116 in 0db10a1

if let Poll::Ready(outcome) = self.downloader.poll(cx) {

reth/crates/engine/tree/src/engine.rs

Line 116 in 0db10a1

if let Poll::Ready(outcome) = self.downloader.poll(cx) { logs from the net target would also help

mattsse commented 2 days ago
  1. 2024-11-20T07:27:23.222115Z INFO ChainHandler: enter the poll loop and before the downloader poll

where's this log?, could you please reference the instrumented files, otherwise this is hard to follow

yutianwu commented 2 days ago

For the block request mentioned above, we don't actually send it to the P2P peer immediately; instead, it only gets sent once the request is polled.

The issue here is that the request might not be processed promptly, as it depends on when it gets polled again. This, in turn, relies on other events triggering the ChainOrchestrator's polling mechanism.

Could this dependency on external event triggers be causing delays in processing?

okay, I think I've finally understood what the issue is.

this looks indeed pretty consistent.

if you have a way to reproduce this I'd appreciate if you could sprinkle a few more traces at various places in the poll loop

reth/crates/engine/tree/src/engine.rs

Line 82 in 0db10a1

loop { like enter, pending and

reth/crates/engine/tree/src/engine.rs

Line 116 in 0db10a1

if let Poll::Ready(outcome) = self.downloader.poll(cx) {

reth/crates/engine/tree/src/engine.rs

Line 116 in 0db10a1

if let Poll::Ready(outcome) = self.downloader.poll(cx) { logs from the net target would also help

yutianwu commented 2 days ago
  1. 2024-11-20T07:27:23.222115Z INFO ChainHandler: enter the poll loop and before the downloader poll

where's this log?, could you please reference the instrumented files, otherwise this is hard to follow

hi, you can refer to this commit:

https://github.com/bnb-chain/reth/commit/3948b4b2cd557a63a4dff71428c6fe0ec922e5e8

The logs are added to ChainOrchestrator and ChainHandler

mattsse commented 2 days ago

The issue here is that the request might not be processed promptly, as it depends on when it gets polled again.

where is that modification?

yutianwu commented 2 days ago

The issue here is that the request might not be processed promptly, as it depends on when it gets polled again.

where is that modification?

We haven’t made changes to the new engine. I’m just unsure whether we can reliably get the block promptly with the new engine. Could you clarify or share any insights on this? Thanks!

mattsse commented 2 days ago

ah yes, you're on an outdated version of the ChainHandler impl

this exact bug has been fixed here https://github.com/paradigmxyz/reth/pull/11981

https://github.com/bnb-chain/reth/blob/3948b4b2cd557a63a4dff71428c6fe0ec922e5e8/crates/engine/tree/src/engine.rs#L120

yutianwu commented 2 days ago

ah yes, you're on an outdated version of the ChainHandler impl

this exact bug has been fixed here #11981

https://github.com/bnb-chain/reth/blob/3948b4b2cd557a63a4dff71428c6fe0ec922e5e8/crates/engine/tree/src/engine.rs#L120

Great to know that, I will check this out and get you back~ Thanks for your help~

yutianwu commented 1 day ago

ah yes, you're on an outdated version of the ChainHandler impl

this exact bug has been fixed here #11981

https://github.com/bnb-chain/reth/blob/3948b4b2cd557a63a4dff71428c6fe0ec922e5e8/crates/engine/tree/src/engine.rs#L120

Hi, I think this issue has been resolved. I'm going to close it now. Thanks for your kind help!