Closed martinboehm closed 2 months ago
Hi , Have you solved this problem ? I had same problem with you https://github.com/maticnetwork/bor/issues/1239#issue-2279597974
@charleswong1025 No solution so far.
having the same problem. restart helps node to got synced but after that headlag starts rising again
Same here. Fix please!
Hey folks, could you please upgrade your bor to the latest beta version v1.3.2-beta-2 and see if it helps. Also, it would be good if you can increase the peer count using the --maxpeers
flag. Maybe set it to a higher value (e.g. 200). Thanks!
I have setup a new node with the latest release version and set maxpeers to 200, the problem persists:
May 23 12:24:54 polygon bor[46683]: WARN [05-23|12:24:54.537] Failed to whitelist checkpoint err="missing blocks"
May 23 12:24:54 polygon bor[46683]: WARN [05-23|12:24:54.537] unable to handle whitelist checkpoint err="missing blocks"
May 23 12:24:55 polygon bor[46683]: INFO [05-23|12:24:55.493] Looking for peers peercount=0 tried=396 static=0
May 23 12:25:02 polygon bor[46683]: INFO [05-23|12:25:02.536] Got new milestone from heimdall start=57,296,582 end=57,296,607 hash=0x4f42dd6dc1a723ff7fae04f41a9201b4f4109cb10a0f8429e35f5f95c6c99424
May 23 12:25:02 polygon bor[46683]: WARN [05-23|12:25:02.536] unable to handle whitelist milestone err="missing blocks"
May 23 12:25:05 polygon bor[46683]: INFO [05-23|12:25:05.551] Looking for peers peercount=0 tried=398 static=0
May 23 12:25:14 polygon bor[46683]: INFO [05-23|12:25:14.535] Got new milestone from heimdall start=57,296,582 end=57,296,607 hash=0x4f42dd6dc1a723ff7fae04f41a9201b4f4109cb10a0f8429e35f5f95c6c99424
May 23 12:25:14 polygon bor[46683]: WARN [05-23|12:25:14.535] unable to handle whitelist milestone err="missing blocks"
May 23 12:25:15 polygon bor[46683]: INFO [05-23|12:25:15.655] Looking for peers peercount=0 tried=245 static=0
May 23 12:25:25 polygon bor[46683]: INFO [05-23|12:25:25.676] Looking for peers peercount=0 tried=447 static=0
My environments are:
root@polygon:~# bor version
Version: 1.3.2
GitCommit: bade7f57df5c09ae060c15fc66aed488c526149e
root@polygon:~# heimdalld version
1.0.5
root@polygon:~# uname -a
Linux polygon 5.15.0-107-generic #117-Ubuntu SMP Fri Apr 26 12:26:49 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux (Ubuntu 22.04)
My bootnodes are:
bootnodes = ["enode://b8f1cc9c5d4403703fbf377116469667d2b1823c0daf16b7250aa576bacf399e42c3930ccfcb02c5df6879565a2b8931335565f0e8d3f8e72385ecf4a4bf160a@3.36.224.80:30303", "enode://8729e0c825f3d9cad382555f3e46dcff21af323e89025a0e6312df541f4a9e73abfa562d64906f5e59c51fe6f0501b3e61b07979606c56329c020ed739910759@54.194.245.5:30303"]
Hey folks, could you please upgrade your bor to the latest beta version v1.3.2-beta-2 and see if it helps. Also, it would be good if you can increase the peer count using the
--maxpeers
flag. Maybe set it to a higher value (e.g. 200). Thanks!
I have setup a new node with the latest release version and set maxpeers to 200, the problem persists, cloud you help? The sync speed is slow, and I have take 2 weeks and less than 40% was completed.
When we restart Bor on the non syncing server, it catches the top of the chain and then stops syncing, even when we let it run for hours.
We have the exact same behaviour, after restoring from a snapshot.
Tested v1.3.2-beta-2
the issue still exists
Hey folks, the issues we're seeing are broadly classified into 2 categories. One is those who aren't getting any healthy peers and another is those whose nodes stop syncing randomly despite having good peers.
For the former issue, they're not easily reproducible as all our internal nodes are working well and a lot of partner nodes are also working well. I'd suggest to look into the node config and add more static/trusted peers if possible. The logs of missing blocks which you're seeing in bor is not the cause of syncing issues but it's just an indicator. It's not possible to debug just using this info so please share debug/trace logs in this issue. We have added a new rpc/ipc call debug_peerStats
/debug.peerStats()
to get info about the peers your node is connected with. Please share the result of this if possible.
For the latter, please upgrade to bor v1.3.3-beta3 as it contains a potential fix.
Thanks!
Hello, we have been having this issue quite a lot, going deeper into investigation we added some static peers that we knew they were healthy and the issue is gone, so my experience leads me that problem is somewhere on p2p. I don't have enough info if its a lot of bad peers in the network or if something is triggering this from specific peers or just bad peer search, but i guess this is a good starting point for investigation.
Hello, we have been having this issue quite a lot, going deeper into investigation we added some static peers that we knew they were healthy and the issue is gone, so my experience leads me that problem is somewhere on p2p. I don't have enough info if its a lot of bad peers in the network or if something is triggering this from specific peers or just bad peer search, but i guess this is a good starting point for investigation.
I have also been troubled by this issue for a long time. Could you share your peer information to help me temporarily resolve this problem?
v1.3.3
has been released as a stable tag yesterday, in case you folks want to update.
We'll keep working on some improvements and potentially bugs fixing wrt sync problems in the upcoming v1.3.4
Thanks for your patience
This issue is stale because it has been open 14 days with no activity. Remove stale label or comment or this will be closed in 14 days.
This issue was closed because it has been stalled for 28 days with no activity.
@manav2401
Please re-open this. Facing the same exact issue. I've four nodes, 2 in EU and 2 in US. Only one in EU shows this behaviour.
I enabled debug log.. and got this ...
DEBUG[07-12|18:40:37.918] Recovered from EVM failure. Error:="Found dependency"
DEBUG[07-12|18:40:37.918] Recovered from EVM failure. Error:="Found dependency"
DEBUG[07-12|18:40:37.918] Recovered from EVM failure. Error:="Found dependency"
DEBUG[07-12|18:40:37.918] Recovered from EVM failure. Error:="Found dependency"
DEBUG[07-12|18:40:37.963] Dereferenced trie from memory database nodes=2382 size=1.11MiB time=16.494309ms gcnodes=1,032,523 gcsize=428.34MiB gctime=4.915930839s livenodes=1,097,
249 livesize=371.24MiB
DEBUG[07-12|18:40:37.964] Inserted new block number=59,280,236 hash=daf8a1..94a7d6 uncles=0 txs=83 gas=10,124,138 elapsed=116.228ms root=b0eccf..143a7e
DEBUG[07-12|18:40:38.016] IP exceeds bucket limit ip=51.38.72.15
DEBUG[07-12|18:40:38.036] blockstm exec summary execs=85 success=85 aborts=0 validations=85 failures=0 #tasks/#execs=100.00%
DEBUG[07-12|18:40:38.095] Dereferenced trie from memory database nodes=1075 size=461.26KiB time=7.589081ms gcnodes=1,033,598 gcsize=428.79MiB gctime=4.92351855s livenodes=1,098,
890 livesize=371.81MiB
INFO [07-12|18:40:38.096] Imported new chain segment number=59,280,237 hash=c8f9a3..9aa886 blocks=18 txs=1412 mgas=325.242 elapsed=4.357s mgasps=74.647 snapdiffs=4.
89MiB triedirty=481.21MiB
DEBUG[07-12|18:40:38.096] Inserted new block number=59,280,237 hash=c8f9a3..9aa886 uncles=0 txs=85 gas=14,977,668 elapsed=130.632ms root=ed7d84..6cb34b
DEBUG[07-12|18:40:38.096] Synchronisation terminated elapsed=4m20.862s err=nil
DEBUG[07-12|18:40:38.096] Announced block hash=c8f9a3..9aa886 recipients=33 duration=2562047h47m16.854s
DEBUG[07-12|18:40:38.109] Reinjecting stale transactions count=0
Notice the
DEBUG[07-12|18:40:38.096] Synchronisation terminated elapsed=4m20.862s err=nil
After this it stops syncing... obviously! But why? Basically it catches up to the tip and then immediately stops!
trace logs below.
Hi, we have two identical Linux Debian 12 servers running Heimdall+Bor mainnet (the latest versions, Bor 1.3.1, however the described behaviour happened with previous versions too), with the following Bor config:
One of the servers is running fine, the other is not syncing. When we restart Bor on the non syncing server, it catches the top of the chain and then stops syncing, even when we let it run for hours. This behaviour is repeated over and over again. When the syncing stops, the log from the server looks like this:
When we query the list of peers on each of the server, both servers report about 16 peers (however each one a different list) but we are not able to transfer the peers from the syncing server to non syncing, Bor is not adding the added peers to the list.
Could you please help us with this strange behaviour? Thank you in advance.