Closed alecalve closed 3 months ago
Hi @alecalve, can you share the exact command you used to run the ancient pruner?
Also, can you also send the result of bor snapshot inspect-ancient-db --datadir <datadir> --datadir.ancient <ancient_dir>
to debug further. Thanks!
I ran:
bor snapshot prune-block --datadir=/opt/data --block-amount-reserved=16000000
The node was not caught up and using a lot of disk space, but I didn't want to prune blocks we hadn't yet processed hence the high value for reserved blocks.
Here's the output of the inspect command:
+--------------------------------+----------+
| FIELD | ITEMS |
+--------------------------------+----------+
| Start block number of | 22853632 |
| ancientDB (offset) | |
| End block number of ancientDB | 38853631 |
| Remaining items in ancientDB | 16000000 |
+--------------------------------+----------+
| ANCIENTSTORE INFORMATION |
+--------------------------------+----------+
I see, thanks for the info. I am afraid, a wrong value of offset is being set / used which is causing this. Can you run the following script and send the results back? It'll be really helpful to debug. Thanks!
https://gist.github.com/manav2401/157a102434eaa5b28983a9a477caa78d (You might want to create a new go project and run this file - main.go)
And it'll be helpful if you can share the logs while the ancient pruner was running (full logs will be better to spot errors if any). Thanks!
Unfortunately I won't have the logs but I remember seeing no errors.
Here's the output of your script:
offsetOfCurrentAncientFreezer: 22853632
offsetOfLastAncientFreezer: 0
Ah I do have the logs, one thing that happened is that the docker container that ran the script, once it was finished, was restarted in a loop, could it explain it? On the further retries it logged:
Backup old ancientDB error err="the number of old blocks is the same to reserved blocks, ancientItems=16000000"
The first run ended with:
Backup old ancientDB done "current start blockNumber in ancientDB"=22,853,632
Thanks. This is fine I guess as it didn't prune the second time. I guess I know which code path is causing the issue but still need to validate it first. How's your setup like? Do you run via published packages, or can you run a new bor branch?
We use Docker and run the official image but we can build an image from any source.
Alright, can you please deploy this branch (which is cut off from 1.3.3) on your setup and try restarting bor and send logs across? It doesn't fix anything but just adds logs which will be very helpful for debugging. I know that this is not the ideal way to debug but I don't think this issue is directly reproducible.
https://github.com/maticnetwork/bor/tree/manav/ancient-pruner-debug
Thanks!
Oh I think the issue was that I deployed 1.3.2 over the pruned data dir.
Your branch is working fine and 1.3.3 too.
Sorry for the trouble!
Phew. Closing this issue for now. Feel free to re-open if needed. Thanks!
I may have found a follow up issue: https://github.com/maticnetwork/bor/issues/1275
System information
Bor client version: 1.3.3
OS & Version: Linux
Environment: Polygon Mainnet
Command used:
Overview of the problem
After running the new
bor snapshot prune-block
command, the node won't start: