Closed eldimious closed 1 month ago
I’m also still very much stuck on this. I’ll add my experience in hope we find a pattern.
I initially had 2 nodes. They were getting stuck so I’ve started killing them automatically with a liveness probe (they run in k3s on a VM). The issue was happening more often so I decided to kill the node more quickly when it get stuck. But this made the problem worse apparently. At some point the node were restarting every 5 minutes. So I did the opposite : let the node run for several hours after it gets stuck. And somehow it made the problem less bad. Now my 2 prod nodes are running fine for several hours a day, but still I get at least 1 to 2 restarts per days...
I then tried to run a third node. I recreated it from scratch using the community managed snapshots. Only one was fast enough to download a full bor mainnet snapshot in a reasonable amount of time : http://services.stakecraft.com/docs/snapshots/polygon-snapshot (it downloads at around 50MB/s)
I’ve use the default config generated automatically at first. This clearly doesn’t work. I then tried 2 things :
The things I noticed is that my new third nodes have more peers than my production polygon nodes, 28 instead of 16. My new third node seems more stable than the others. So I confirm that it looks related peers. But I don’t know how to increase the number further... I don’t even know what is a reasonnable amount of peers to have..
Note :
This issue is stale because it has been open 14 days with no activity. Remove stale label or comment or this will be closed in 14 days.
Hey folks, could you please try upgrading to v1.3.3
? The release contains a number of p2p and sync fixes, which will be followed by some more patches in v1.3.4
.
I have upgraded to 1.3.3 the same issue, I have downloaded old febriary snapshot and it stuck at 4 000 000 blocks before last block approximately, so I used restart script to bring node alive
This issue is stale because it has been open 14 days with no activity. Remove stale label or comment or this will be closed in 14 days.
This issue was closed because it has been stalled for 28 days with no activity.
System information
Bor client version: 1.2.1
Heimdall client version: 1.0.3
OS & Version: Linux
Environment: Polygon Mainnet
Type of node: Full
Overview of the problem
I am running a full node using bor and heimdall via docker the last 2 months but seems that the bor sync stucks 11h ago at block 0x312d050. I am getting following logs from bor docker image:
Any idea how can i fix it? I tried to restart docker image but the error remains.