ledgerwatch / erigon

Ethereum implementation on the efficiency frontier
GNU Lesser General Public License v3.0
3.03k stars 1.05k forks source link

v2.60.2 regression - disk usage spikes to 100% #10932

Open keithchew opened 4 days ago

keithchew commented 4 days ago

Just upgraded from v2.60.1 to v2.60.2, running on sepolia.

After upgrade, disk usage spikes to 100%. Reverting back to v2.60.1, disk usage is goes back down to under 10%.

Note that after reverting, erigon downloads the snapshots (which disk usage is 100%) but after that, it goes back to normal.

Also observed that in v2.60.1, logs for new payload is as below:

[INFO] [06-27|10:31:49.073] [NewPayload] Handling new payload        height=6197283 hash=0x06eec0f98fe466b1379dfac3402768bfb82cd2ddf663ba3fdd3762e3c1acab69
[INFO] [06-27|10:31:50.791] [updateForkchoice] Fork choice update: flushing in-memory state (built by previous newPayload) 
[INFO] [06-27|10:31:51.451] RPC Daemon notified of new headers       from=6197282 to=6197283 amount=1 hash=0x06eec0f98fe466b1379dfac3402768bfb82cd2ddf663ba3fdd3762e3c1acab69 header sending=19.847µs log sending=311ns

But in v2.60.2:

[INFO] [06-27|09:37:13.166] [NewPayload] Handling new payload        height=6197010 hash=0x5ce750104e1fa938f4b3373aa4bd13964531384107385a0fb36644754ae1f9e1        
[INFO] [06-27|09:37:13.167] [EngineBlockDownloader] Downloading PoS headers... hash=0x641b98ea942205cde4b5f69ce9bd3bcf339ca1bc284aba80ef3b1ef9937a75cf requestId=0 
[INFO] [06-27|09:37:14.453] [EngineBlockDownloader] Processed        highest=6197009
[INFO] [06-27|09:37:14.454] Beginning downloaded blocks insertion
[INFO] [06-27|09:37:14.536] [EngineBlockDownloader] Finished downloading blocks from=6197008 to=6197009
[INFO] [06-27|09:37:17.601] [EngineBlockDownloader] blocks verification successful

Looks like a logic change in v2.60.2 is triggering the EngineBlockDownloader into action, but not in v2.60.1?

Giulio2002 commented 1 day ago

hi, can you try branch new-payload2 and see if the issue persists

keithchew commented 1 day ago

hi @Giulio2002, tried new-payload2 branch, behavior is the same as v2.60.2. CPU constantly hits and stays at 100% ceiling... v2.60.1 is more periodic and less aggressive...

v2.60.1 Screenshot 2024-06-30 121317

v2.60.2/new-payloads Screenshot 2024-06-30 120434

In this setup, erigon is running on ubuntu as guest OS under hyper-v as host. Still a regression though...

Giulio2002 commented 1 day ago

could you try commit e8c5632b0b14a87072924334c6359dd6fbcf57ba and 478b5b2ed969cc5daef1d7dcca00d0a19bccabe9? If you could do, it would really help me, i am currently resyncing a sepolia node to reproduce

keithchew commented 1 day ago

I have applied both commits to v2.60.1, and CPU did not spike. Let me know if you would like me to try any other commits to help track this down.

Giulio2002 commented 23 hours ago

try 2e590ce9677cbadd5b7d1b4d8370af89aed9c7f8, aa591fd0aeaa34529fe3963f46d7e4a7701cf10e, c637e37e63b5606d489b7602b412289191f9b8a5 and cac1c2d135f675ca456e543f81900f89515dac50 . My sepolia node is still syncing, I would really appreciate if you could attempt these