Open kiss81 opened 1 year ago
Update: clearing the suggested ~/.besu/caches directory didn't work for me: I still see this error and the deposit sync starts all over again:
Jun 29 13:47:37 ethVal2 nimbus_beacon_node[184]: ERR 2023-06-29 13:47:37.333+02:00 Corrupted deposits history detected
Really like to use nimbus as my beacon node as it's so nice on resources, but right now it makes the system run hot. Would it be an idea to go back and try history:archive instead of prune mode or maybe skip the trusted sync? Any ideas are welcome as I have no clue.
Same problem here.
Besu version: 24.3 Nimbus verison: 24.3.0
log:
NTC 2024-04-14 12:13:10.358+00:00 Attestation failed to match head topics="chaindag" epoch=276750 validator=87a3ef28
INF 2024-04-14 12:13:11.001+00:00 Slot start topics="beacnde" head=627179d7:8856063 delay=1ms94us524ns finalized=276749:ff359206 peers=158 slot=8856064 sync=synced epoch=276752
INF 2024-04-14 12:13:11.048+00:00 Prepared beacon proposers topics="rest_validatorapi" numUpdatedFeeRecipients=0 numRefreshedFeeRecipients=1
ERR 2024-04-14 12:13:15.661+00:00 Corrupted deposits history detected topics="elchain" ourDepositsCount=1434821 targetDepositsCount=1434829 ourDepositsRoot=1f385454 targetDepositsRoot=379589eb
I've tried deleting besu cache, or disabling besu cache with --auto-log-bloom-caching-enabled=false
, no difference.
I found the query to get deposits was like the following:
{"jsonrpc":"2.0","method":"eth_getLogs","params":[{"fromBlock":"0x12a7000","toBlock":"0x12a8000","address":"0x00000000219ab540356cbb839cbe05303d7705fa","topics":["0x649bbc62d0e31342afea4e5cd82d4049e7e1ee912fc0889aa790803be39038c5"]}], "id":1}
So I tried to compare my besu node's response with infura's. So I wrote a script to compare all the response from 0x10a0000 to the latest, and found no difference at all. That made me to think it is nimbus's problem.
If could be that the initially imported deposit snapshot was incorrect (doing a new checkpoint sync should fix it).
If could be that the initially imported deposit snapshot was incorrect (doing a new checkpoint sync should fix it).
- Did this start happening after an update, or on a fresh sync?
- Is this on a single-EL or on a multi-EL setup?
- Was one of the EL restarted while Nimbus was running?
Thanks for your reply.
Another thing that might be interesting:
The difference between 'ourDepositsCount' and 'targetDepositsCount' does not stay fixed. The difference is fixed for a given targetCount, but as epochs advance and targetCount changes, the difference between these two will change. I have recorded the following logs:
ourDepositsCount=1432476 targetDepositsCount=1432497 ourDepositsRoot=a6630d57 targetDepositsRoot=c3c073e3
ourDepositsCount=1433043 targetDepositsCount=1433056 ourDepositsRoot=941e37dd targetDepositsRoot=a98323d9
ourDepositsCount=1434821 targetDepositsCount=1434829 ourDepositsRoot=1f385454 targetDepositsRoot=379589eb
ourDepositsCount=1456631 targetDepositsCount=1456656 ourDepositsRoot=1bf0fec3 targetDepositsRoot=6786bb2e
I'm not very familiar with the machanism, but I think if there are some fixed deposits that are missing, the difference should stay a fixed number.
I did resynced deposit snapshot. It worked for now.
I did resynced deposit snapshot. It worked for now.
Great to hear that this is working now! Do you know whether the previous deposit snapshot was correct? If there is a way to reproduce the issue, we could investigate it more thoroughly.
Luckily I have a copy of the previous deposit snapshot, in two format.
From API:
$ curl -X GET http://localhost:5052/eth/v1/beacon/deposit_snapshot
{"data":{"finalized":["0xed851189578e43fe30657d1e6193b45bef7ee7a6e226eaebc9675d801081dd6d","0x51d304594b583192b6579834499c62376a0fca2233b96aee21e87c6fd5c28637","0xdd0ab193e510ea7ddf508fc9c2900c2098eae2fc49a10d7c25293975fa7c3ca9","0x77f61de440cb93a2c0222f8241f9a6726c75a5e919341abe02a30e0331b0c355","0xdbf9383ebffa839826fc89f993f1ef289c808a951310ff376866c46952c2c619","0xe7bb3cc87417966cf117c124f5ce0394e834d5350a3f3a699cd13271a657b01b","0xcd51fb3413c81cc4c88adca8bf399e96e3fdd3443b0c48f74bde393e92c44edd","0xcc1bc98fe5b78d66581d4f7ea317e642947806215095c738ee6263a6064b0923","0x78545fc1d24b409a8278a0f87985a792a430be3d63f2d94d4fcc17bee0e6b12e","0x102068d3bffe793aadd25cd44bfb6b840b0743c1772538dbbf79eec54c7aa756","0xe58dd9471eb26bee450525e5a4cf0f5141417a1bc10ec16f2cc91ea45188e242","0x8e3ef7a9c1604f777b9acb267b0a5b07bae59068bab7afe5158a9d2f5b4bbd0d","0xd47c6b785cafa0975c9edf49ff18f403a216c2259137c0e7d45f32a789d678f8"],"deposit_root":"0xedc61e5bb048448f184a0003560a33db848104623ea4055e287206b1d9292290","deposit_count":"780148","execution_block_hash":"0xd162f42047b00d9712b021bd8c7c80e299360126adb246237e64fd4e1a0f0d92","execution_block_height":"17403122"}}
From DB:
sqlite> select hex(value) from key_values where hex(key)='0E';
C808F4C202D162F42047B00D9712B021BD8C7C80E299360126ADB246237E64FD4E1A0F0D92D012949E8E973D86D7768D5636BD582787720E90A3E6C870F0C339E6A1EC0A27B9F77D5736CF432A404A7F6F682E06990DB70BF02636E30DB4CEA1A6FF0088E4D47C6B785CAFA0975C9EDF49FF18F403A216C2259137C0E7D45F32A789D678F8C4519D62E9133429F908DCF552C407F1A7019A2E3A14B22539E567DA6DDA35328E3EF7A9C1604F777B9ACB267B0A5B07BAE59068BAB7AFE5158A9D2F5B4BBD0DE58DD9471EB26BEE450525E5A4CF0F5141417A1BC10EC16F2CC91EA45188E242102068D3BFFE793AADD25CD44BFB6B840B0743C1772538DBBF79EEC54C7AA7569AAB14144DD209C2EDEE2DA32154BE858FE5003CFFDDA9312A80CB4BA801161078545FC1D24B409A8278A0F87985A792A430BE3D63F2D94D4FCC17BEE0E6B12ECC1BC98FE5B78D66581D4F7EA317E642947806215095C738EE6263A6064B0923CD51FB3413C81CC4C88ADCA8BF399E96E3FDD3443B0C48F74BDE393E92C44EDD87AA29529266D9F8813B779FB3DB5783627C21FB7D72DE768E79D3CD9B60C479018432F487C9B364F560DFAE337ED6E1B53A7D7F1A77C1C0EE26EAC9B0778FB8E7BB3CC87417966CF117C124F5CE0394E834D5350A3F3A699CD13271A657B01BDBF9383EBFFA839826FC89F993F1EF289C808A951310FF376866C46952C2C61977F61DE440CB93A2C0222F8241F9A6726C75A5E919341ABE02A30E0331B0C355DD0AB193E510EA7DDF508FC9C2900C2098EAE2FC49A10D7C25293975FA7C3CA951D304594B583192B6579834499C62376A0FCA2233B96AEE21E87C6FD5C28637D47152335D9460F2B6FB7ABA05CED32A52E9F46659CCD3DAA2059661D75A6308ED851189578E43FE30657D1E6193B45BEF7EE7A6E226EAEBC9675D801081DD6D0000000000000000000000000000000000000000000000000000000000000000000000FE1600FE1600FE1600FE1600FE1600E61600280BE774F28C090100000000
sqlite> select hex(value) from key_values where hex(key)='08';
C008F4C202D162F42047B00D9712B021BD8C7C80E299360126ADB246237E64FD4E1A0F0D92D012949E8E973D86D7768D5636BD582787720E90A3E6C870F0C339E6A1EC0A27B9F77D5736CF432A404A7F6F682E06990DB70BF02636E30DB4CEA1A6FF0088E4D47C6B785CAFA0975C9EDF49FF18F403A216C2259137C0E7D45F32A789D678F8C4519D62E9133429F908DCF552C407F1A7019A2E3A14B22539E567DA6DDA35328E3EF7A9C1604F777B9ACB267B0A5B07BAE59068BAB7AFE5158A9D2F5B4BBD0DE58DD9471EB26BEE450525E5A4CF0F5141417A1BC10EC16F2CC91EA45188E242102068D3BFFE793AADD25CD44BFB6B840B0743C1772538DBBF79EEC54C7AA7569AAB14144DD209C2EDEE2DA32154BE858FE5003CFFDDA9312A80CB4BA801161078545FC1D24B409A8278A0F87985A792A430BE3D63F2D94D4FCC17BEE0E6B12ECC1BC98FE5B78D66581D4F7EA317E642947806215095C738EE6263A6064B0923CD51FB3413C81CC4C88ADCA8BF399E96E3FDD3443B0C48F74BDE393E92C44EDD87AA29529266D9F8813B779FB3DB5783627C21FB7D72DE768E79D3CD9B60C479018432F487C9B364F560DFAE337ED6E1B53A7D7F1A77C1C0EE26EAC9B0778FB8E7BB3CC87417966CF117C124F5CE0394E834D5350A3F3A699CD13271A657B01BDBF9383EBFFA839826FC89F993F1EF289C808A951310FF376866C46952C2C61977F61DE440CB93A2C0222F8241F9A6726C75A5E919341ABE02A30E0331B0C355DD0AB193E510EA7DDF508FC9C2900C2098EAE2FC49A10D7C25293975FA7C3CA951D304594B583192B6579834499C62376A0FCA2233B96AEE21E87C6FD5C28637D47152335D9460F2B6FB7ABA05CED32A52E9F46659CCD3DAA2059661D75A6308ED851189578E43FE30657D1E6193B45BEF7EE7A6E226EAEBC9675D801081DD6D0000000000000000000000000000000000000000000000000000000000000000000000FE1600FE1600FE1600FE1600FE1600E61600080BE774
I see the snapshot was quite old. EL block 17403122 is roughly around the time when I setup the node. It seems it was not updated ever.
I remember every time when I restart nimbus before, it would sweep the deposits from that snapshot point, and became running quietly after the sweep. It was since recent weeks that I noticed it keep sweeping and showing the error after each time.
@arnetheduck thank you for looking at the issue, but it closed a bit too soon I think :) Follow of issue #5136 as it got closed.
I do:
then start the node with systemd
After it does the backfill for a few hours all is good for a few minutes and then this happens:
After half an hour it's in sync again and after a few minutes the process starts all over again in a loop.
When I look in the besu log I see this:
Now I found the "Corrupted deposits history" it seems I am not alone: #4669 @ibhagwan (found a solution yet?)
edit: now testing the clear .caches folder from besu.