Closed eldimious closed 3 months ago
same
I tried also to debug.setHead
to change head to some blocks (1000+ behind) and for some hours started sync again, but then stopped again.
Any idea how to fix it?
Same issue.
Same version for bor and heimdall. Although stuck at a different height(51640301), constantly getting WARN log with
Dec 28 14:30:33 203078 bor[1828860]: WARN [12-28|14:30:33.977] unable to handle whitelist milestone err="missing blocks"
Can you check your peers using ipc and admin.peers command ?
@0xKrishna
enode://11e0cbb03a834019b0222f54bccf32512bef4294dd722642684762d1d01c84031c1075767195d9968dcdb9e38326f08b14547d8e33b0b67a0ef1aa0b045845d0@35.171.120.130:30303?discport=30315,
enode://b0f026f7ccfd5c1450e933572ae44b262a7d084647a30d0a8d9e2c8cab8d5b1c7721f3c60bfcd50c0fede114c7e2d316649389ba2449ca85d1ddd9e2947f1c28@147.135.100.106:30303?discport=30334,
enode://2d4bd1fa38182fa868a583fc946c8d5e4043b013381cf20927c16cf8f17b4f3e793c5e9f34fc785c52d887aab07181bdb0ebae50d9e3f05e5c14aed19f81929a@65.108.127.87:30303?discport=30340,
enode://ab879b4eaacf495ec760f2806e78509da80e327ba4262d8153698f88b0a95287a692bbaf3a3cece9ad27f889246c04e2b5ca8e75bf083acbb4806eb669cc3a77@35.171.120.130:30303?discport=30334,
enode://1a69f7dae12959a358b92a395ec79de2ab4601a59a5b0b951d4e6247da2101d7d6d77a919086251e70b552a49ae74d630e19233306a189a1b627c2115ecf3cfa@34.203.27.246:30303?discport=30320,
enode://574a9195f40a7c4bd68536167ef53a7385bab8934dfc8db94d013b1a73af76eb73f148536cb8b8365e8240728f6e80af0ddb4ead3a2544de907cce561839ce61@51.81.217.117:30303?discport=30323,
enode://142cce22e125325f4895b2268e32185f5dbe90f9c818ab135f16c7face23a55b46d0b78a0286595a262d4fa58ff314e7e2553e13f528a3c3e9616184b77f5b85@65.108.127.87:30303?discport=30323,
enode://50c8f9d2849a209383edd15dfd67ba0a8d3f5e9853fd1af9c1678f4aef2dc5e3817c34ddce9390d5e8dd4891ad7f66003a3bea5af9e288df6f26ed070d9bd741@54.38.217.112:30303?discport=30335,
enode://72be2da5ba01bc2f3a7764bf1d4f18550a36df629820ea0f6d37fe1cd1355d0f1c201b2a5f382e794ee56e0f5befa504e85e96548a45a0fba44bb6bd1075e28e@54.38.155.225:30303?discport=30306,
enode://53b53f55f2a1674873f8f58ee23616db8384f278a1206cf79c8c18d4ebc32b4424128229de2ea999803c08c9262974f1fb1f2b0d87ca6ec40aea1594c0ba0ef7@65.108.1.189:30303?discport=30337,
enode://eb0ee5596ea6df526eb7e0ace41f015bcb9ee4f27996c72ea15d1cd28ec69f89b6e64247696c0150111b52ca58810f5d0f42d59ac38fdb26ba7323bcc835475a@51.81.196.100:30303?discport=30313,
enode://c4a2a7c422ddce70a39164ce53762262bd5dc8917f5613b1c92c94affb36516e63f88721763a1dcfed5f36403e0fc21894e34c2981f2f6f1f100b9f186a986a1@51.38.72.15:30303?discport=30307,
enode://2197472b27c39587e2ae2c199e91527a25d25b2c1217f14c8d8b342068209a889913c7c1eb6f60044a0d28bd59ccec157d18ebb7918293e8878d11185831cf22@54.38.75.21:30303?discport=30320,
enode://b6d9bef47ce86b94331cdcfd2a1a91f28ab48db171aa70659973b3869988e7e4806fd24406c6f57187664643dffc0edf74e7a16ac315ca7933589357ec875550@51.38.72.15:30303?discport=30311,
enode://4585b746a2ae2f74575313199bd35159e8b679608fa1bd4e3a2823c0c24f8e49f9cb1e0c312de30a8b08c16a6666101897ffff47a6c162dca6ddb87c206c4cd2@66.70.233.151:30303?discport=30313,
enode://c8ab3d6ec8d7c1c7df462f55f02acaced2949ec4542475fa25ebb104feaa78a196f0e39cfc2bf1236ead1c647b734726cb9f4f03eb933c94f318cca160e5ce16@54.38.217.112:30303?discport=30334
Can you check your peers using ipc and admin.peers command ?
Sure. I rolled back to bor v1.1.0 from v1.2.1 because some issues said that rollingback might be a solution. and then found an interesting performance every time I restart bor service, it syncs for a while and then stuck with the above "whitelist milestone" log.
> admin.peers [{ caps: ["eth/66", "eth/67", "eth/68", "snap/1"], enode: "enode://6de3bbba54699dcc11b982c7970fdd938946d3638bab27d9006698b998447cf838891a310c61d6c74d042091366ba07690ef1f09a026fab28a31a06cd387b67b@13.57.125.97:30321", enr: "enr:-KO4QIY12LW3IWDW2JzqMdtg9Pyv7PEASdnlLFAEzUEuzOgVEvW5hWe2EB_Jd6iqKnRHi_SyP1INx6iDk3a6CMoyqOqGAYvR7YGvg2V0aMfGhNwIhlyAgmlkgnY0gmlwhA05fWGJc2VjcDI1NmsxoQNt47u6VGmdzBG5gseXD92TiUbTY4urJ9kAZpi5mER8-IRzbmFwwIN0Y3CCdnGDdWRwgnZx", id: "136d74cf29e85b49f991b1d97b5800f1a45968b0542642c47c970c1502762313", name: "bor/v1.1.0/linux-amd64/go1.20.10", network: { inbound: false, localAddress: "172.18.35.78:37836", remoteAddress: "13.57.125.97:30321", static: false, trusted: false }, protocols: { eth: { version: 68 }, snap: { version: 1 } } }, { caps: ["eth/66", "eth/67", "eth/68", "snap/1"], enode: "enode://b8187a46754cdf631d67b89e3e73d5e061ab2ce5a62cc8a79cfd754b04dc5394b381f1d99d59a8b6baeb68b4c019512b59dcbdc0cb682320f96508331cf8e8f3@54.38.217.112:30303?discport=30324", id: "1c405a70749de50ea441c6c59c07e7d4dde5e18f47102a20b88db98cddcbb6a2", name: "bor/v1.0.6/linux-amd64/go1.20.8", network: { inbound: false, localAddress: "172.18.35.78:51320", remoteAddress: "54.38.217.112:30303", static: false, trusted: false }, protocols: { eth: { version: 68 }, snap: { version: 1 } } }, { caps: ["eth/66", "eth/67", "eth/68", "snap/1"], enode: "enode://256fe3efb2f83e4821f4d028273757e525da48bb69a3da5c4230a410d5b96e948a79ae42e60a4914092249ee3bb928756534c67b6c3003f0d08a180373735edc@65.108.1.189:30303?discport=30395", enr: "enr:-KO4QHQlnI0aegmfJbdsiPIskZywzNjBmulaKf9scy3wuCR_XirUnjEjwSsDfjJe40LWodLNpjLDW48N4MtdFEXOXh6GAYx2yUm_g2V0aMfGhNwIhlyAgmlkgnY0gmlwhEFsAb2Jc2VjcDI1NmsxoQIlb-Pvsvg-SCH00CgnN1flJdpIu2mj2lxCMKQQ1blulIRzbmFwwIN0Y3CCdl-DdWRwgna7", id: "3e8f038a2af1414377f24cacf7e6591b4007c60b8de292b7bec24d7a27cd9c49", name: "bor/v1.0.6/linux-amd64/go1.20.8", network: { inbound: false, localAddress: "172.18.35.78:52390", remoteAddress: "65.108.1.189:30303", static: false, trusted: false }, protocols: { eth: { version: 68 }, snap: { version: 1 } } }, { caps: ["eth/66", "eth/67", "eth/68", "snap/1"], enode: "enode://2cd2be98b78f486171994f32ca995f4d53a783172f360a9224181c3cb1b487bd88e95658cb05405642ee2455fc31ae0919f8b2699cc02ed9ed2aef09b9fc93c2@54.38.216.84:30303?discport=30331", enr: "enr:-KO4QN1KbAC8kuy161pxm8kHqtI8VMjk9cQjVFJT4s6TH3G-LJK4QAdY7LqugQ8Yt8-hYUzFDrqoaMFR3xQVhQHoH46GAYyGmlAzg2V0aMfGhNwIhlyAgmlkgnY0gmlwhDYm2FSJc2VjcDI1NmsxoQIs0r6Yt49IYXGZTzLKmV9NU6eDFy82CpIkGBw8sbSHvYRzbmFwwIN0Y3CCdl-DdWRwgnZ7", id: "496c218828d2d1864a9e228e7ad33a481ae60acb81becfb2e565053f4e1f1a5c", name: "bor/v1.0.6/linux-amd64/go1.20.8", network: { inbound: false, localAddress: "172.18.35.78:47924", remoteAddress: "54.38.216.84:30303", static: false, trusted: false }, protocols: { eth: { version: 68 }, snap: { version: 1 } } }, { caps: ["eth/66", "eth/67", "eth/68", "snap/1"], enode: "enode://994252f3fbe56302ba967cab1f01fada30ef8fdb335e6f974a55dd258c2052d1c8c7f181c147d3958ca7e5c7aec76f4f316f50891b137dcbcfd811e453f9d8cc@135.125.214.37:30303?discport=30340", id: "6bcba20976d073441dfdda8631ddf8fc0db9056e00485e8fe49717dac36560df", name: "bor/v1.0.6/linux-amd64/go1.20.8", network: { inbound: false, localAddress: "172.18.35.78:44738", remoteAddress: "135.125.214.37:30303", static: false, trusted: false }, protocols: { eth: { version: 68 }, snap: { version: 1 } } }, { caps: ["eth/66", "eth/67", "eth/68", "snap/1"], enode: "enode://29e354ff99595687d321d44b72c0e458f481046edd8d18fc5db69df0d61a44068ce9c715d74651d7c635688962f54251af861b13e5b31b4da54bb2c9f05ac794@5.9.87.183:30303?discport=30495", enr: "enr:-KO4QKwM2X_BENPlgEwVZ9SQjAMLtFF1dbJe9lmJ7eW42ai2R7ZAQ6Gc4Xzy2_BJOXsA8sESHmXeLvCGIINbAqjPxDWGAYyF1OC3g2V0aMfGhNwIhlyAgmlkgnY0gmlwhAUJV7eJc2VjcDI1NmsxoQIp41T_mVlWh9Mh1EtywORY9IEEbt2NGPxdtp3w1hpEBoRzbmFwwIN0Y3CCdl-DdWRwgncf", id: "6f1be92e4e8cb5f36e2d2e988d60d492a5992524258fab93ae146a335a8f690a", name: "bor/v1.0.6/linux-amd64/go1.20.8", network: { inbound: false, localAddress: "172.18.35.78:54748", remoteAddress: "5.9.87.183:30303", static: false, trusted: false }, protocols: { eth: { version: 68 }, snap: { version: 1 } } }, { caps: ["eth/66", "eth/67", "eth/68", "snap/1"], enode: "enode://b9e2f920d31ea6cde2ad56fcd1904455d911ccf58201551c22d41c28f5a1b1d20a67c8db30893651d8a47bfe21a95705505c079892290a8cfad06f1b8c425628@44.221.198.244:30303?discport=30316", id: "7752490f98a21bde471c9151b7bfe28347cf83a0813a9fe6e66320ae63152f5b", name: "bor/v1.0.6/linux-amd64/go1.20.8", network: { inbound: false, localAddress: "172.18.35.78:41940", remoteAddress: "44.221.198.244:30303", static: false, trusted: false }, protocols: { eth: { version: 68 }, snap: { version: 1 } } }, { caps: ["eth/66", "eth/67", "eth/68", "snap/1"], enode: "enode://9f1443433c1b1b79ccc2d95f314c4e0823d0b549d1db43e5e0a2fe3a87fdaeb2d693fa4a8e75fd6a77c2917598d91782fb75b8fc6357c4f13073653894418acb@66.70.207.63:30303?discport=30309", id: "8df6a54d5bc8fcac07f8ece1d738414190fc9fe3400776abb33471b9ead46344", name: "bor/v1.0.6/linux-amd64/go1.20.8", network: { inbound: false, localAddress: "172.18.35.78:53838", remoteAddress: "66.70.207.63:30303", static: false, trusted: false }, protocols: { eth: { version: 68 }, snap: { version: 1 } } }, { caps: ["eth/66", "eth/67", "eth/68", "snap/1"], enode: "enode://6668bb0a2ede7963ebc196f5e2c8e4daf480a1b7510b74ad18491d733ccf32ab754b44422e4d40fb88c996a3d33fa08dc96461d77693c4a7976cadef4340ca71@148.113.163.85:30303?discport=30309", id: "8e60fc39583410b077016422c96f36ecc60f077a4910a8848917dd1e5856c4e4", name: "bor/v1.0.6/linux-amd64/go1.20.8", network: { inbound: false, localAddress: "172.18.35.78:36704", remoteAddress: "148.113.163.85:30303", static: false, trusted: false }, protocols: { eth: { version: 68 }, snap: { version: 1 } } }, { caps: ["eth/66", "eth/67", "eth/68", "snap/1"], enode: "enode://298ba98e471a44af8638c297d4f25060119817d20cd49870717cfef0f92d3d3d1e3039b1b5fcd34ef66e5ef97efefb9d38e68eed20d1eec5929dfc422a3731e9@3.219.138.93:30306", id: "90871a5e7b702d78f49f829b75d44728628d6a0448d2e128dee96d3e8a39383e", name: "bor/v1.1.0/linux-amd64/go1.20.10", network: { inbound: false, localAddress: "172.18.35.78:39982", remoteAddress: "3.219.138.93:30306", static: false, trusted: false }, protocols: { eth: { version: 68 }, snap: { version: 1 } } }, { caps: ["eth/66", "eth/67", "eth/68", "snap/1"], enode: "enode://66153dd3af7f793158934d9bd121f68e1e8c5a4c15d3316f2e222e6743f8a46fb02a3b6e70181521c0f82584ebd8b690fcf7c3056d5b78293f1bbe065f038ed9@54.235.96.140:30306", id: "93c951775b564631f98affc9e4539b91daa825e350de64a3a0b760a65d0a7826", name: "bor/v1.1.0/linux-amd64/go1.20.10", network: { inbound: false, localAddress: "172.18.35.78:37624", remoteAddress: "54.235.96.140:30306", static: false, trusted: false }, protocols: { eth: { version: 68 }, snap: { version: 1 } } }, { caps: ["eth/66", "eth/67", "eth/68", "snap/1"], enode: "enode://697850d0a936d1d63d047ce480e6f39f429f2c33cfeec335526fb1e97aa0a11a43065bad4b0e8223ca053f91307a0a672d79586c4efdb81f531122116e6d132f@15.204.47.194:30303?discport=30340", id: "96b764ec1ca7771bdb60b464e498824b22dfc7c7cd8d8a3c28cb9ce4241d72dc", name: "bor/v1.0.6/linux-amd64/go1.20.8", network: { inbound: false, localAddress: "172.18.35.78:33882", remoteAddress: "15.204.47.194:30303", static: false, trusted: false }, protocols: { eth: { version: 68 }, snap: { version: 1 } } }, { caps: ["eth/66", "eth/67", "eth/68", "snap/1"], enode: "enode://a34a45e54b28eef5cc58e66a932471ffa3d914af052346b423117972aa957d0816f79492e657ccf1f356713f5959274d5f39573acde4d64e00a656ae999f0a30@65.108.127.87:30303?discport=30376", id: "9ede61e13d949a6ff325274262cf677d16093daf8be60c441707c8ba047526d3", name: "bor/v1.0.6/linux-amd64/go1.20.8", network: { inbound: false, localAddress: "172.18.35.78:47926", remoteAddress: "65.108.127.87:30303", static: false, trusted: false }, protocols: { eth: { version: 68 }, snap: { version: 1 } } }, { caps: ["eth/66", "eth/67", "eth/68", "snap/1"], enode: "enode://af51799ca42c94ff9db93aa933dad4d7ae5979153658df2a38f90c38654391f8a929c8d6af7cb04ea151f009a2b163d6458a71662d512adf1d300ea49107738f@5.9.87.183:30303?discport=30432", id: "a51dc5db9ffc3dbd5b5c67ed1925a486788b5e7668ca0c624b31468b4090f000", name: "bor/v1.0.6/linux-amd64/go1.20.8", network: { inbound: false, localAddress: "172.18.35.78:40188", remoteAddress: "5.9.87.183:30303", static: false, trusted: false }, protocols: { eth: { version: 68 }, snap: { version: 1 } } }, { caps: ["eth/66", "eth/67", "eth/68"], enode: "enode://76d2d6284ee5637113e3669e0fdff0fca83535e39ee0752b9338d9e306aad3f9b4db4c8e4e8738ad718c0f442daf96a37fc864d73954f931dd3c2b3d85663766@3.239.87.70:56304", id: "c0506599f03d41572ecbc8ea45b6eee0192c622eccd7d614d3bb9a3fb19e2548", name: "Geth/v1.1.8/linux-amd64/go1.20", network: { inbound: true, localAddress: "172.18.35.78:30303", remoteAddress: "3.239.87.70:56304", static: false, trusted: false }, protocols: { eth: { version: 68 } } }, { caps: ["eth/66", "eth/67", "eth/68"], enode: "enode://e6ddc59f7f585019b428a3a076a55a2ef1401926434f798b9fb29abb5502a6b33698bfba0420642132a959051f5e417af9abf6d67dc87d8e6f8e88acdbe1532b@54.90.91.58:34482", id: "d85b17d766b71531af5a5a57065ad2baef16f75df801e34ac3e446c9ea02470d", name: "Geth/v1.1.8/linux-amd64/go1.20", network: { inbound: true, localAddress: "172.18.35.78:30303", remoteAddress: "54.90.91.58:34482", static: false, trusted: false }, protocols: { eth: { version: 68 } } }]
Any idea how can we solve the issue?
Ι tried to apply https://forum.polygon.technology/t/recommended-peer-settings-for-mainnet-nodes/13018 [p2p.discovery]
i will let you know if this resolves the issue
Above suggestions are not fixing the issue. Any other suggestion?
Above suggestions are not fixing the issue. Any other suggestion?
no luck. Tried a new physical machine with bor 1.1.0 and Heimdall 1.0.3 with snapshot data. All over again. Stuck randomly. The original one with weeks of manual restarts, finally went well for half month, not sure why, and afraid of unexpected stuck someday
@0xKrishna I think I might have hit the same problem on two nodes. The first node stop importing blocks ~8d the other around 2 hours ago.
I have the pprof Goroutine dump for it, see pprof.geth.goroutine.polygon-mainnet-0.pb.gz. It seems to be blocked at https://github.com/maticnetwork/bor/blob/master/core/blockchain.go#L1888.
I have a pprof too, see pprof.geth.goroutine.polygon-mainnet-1.pb.gz. On this one I don't clearly see what is blocked. I don't even seems to see the blockchain import goroutine there, so not sure what it was doing.
For this dump, I have a bor attach
of admin.nodeInfo
and admin.peers
, see pprof-polygon-mainnet-1-attach-nodeIndo-peer.txt.
Let me know if you need more info, I'll more closely follow the nodes to see if they get stuck again so I could gather extra data points.
I tried to stopped this node cleanly, sending a single SIGINT signal, then waited for 4 hours to stop cleanly but it never happened. I decided to force killing which means in this state, this stuck node never completed the clean shutdown sequence.
Same issue on two independent nodes, random block stuck with ERROR:
heimdalld[14653]: ERROR[2024-01-20|20:45:38.152] Span proposed is not in-turn module=bor currentChildBlock=52556670 msgStartblock=52563456 msgEndBlock=52569855
Hey @eldimious @VSGic @maoueh @GeassV ,
unable to handle whitelist milestone
logs. We are working on suppressing these logs to DEBUG. v1.1.0
and heimdall to v1.0.3
. Thank you! 💜
Hey @eldimious @VSGic @maoueh @GeassV ,
- We can ignore
unable to handle whitelist milestone
logs. We are working on suppressing these logs to DEBUG.- I can see your network is peered.
- Please downgrade your bor node to
v1.1.0
and heimdall tov1.0.3
.- Try to restart the clients.
- If the issue persists. Please attach a log dump ( or copy last 200 lines of log ) and configuration used to start the nodes.
Thank you! 💜
well, stuck at 52755409 and then moved to 52756404 and stuck again when trying to dump the log and config files bor version 1.1.0 and heimdall v1.0.3 attached are the log and config: output_24_1_26.log bor_config.txt
Hey @eldimious @VSGic @maoueh @GeassV ,
- We can ignore
unable to handle whitelist milestone
logs. We are working on suppressing these logs to DEBUG.- I can see your network is peered.
- Please downgrade your bor node to
v1.1.0
and heimdall tov1.0.3
.- Try to restart the clients.
- If the issue persists. Please attach a log dump ( or copy last 200 lines of log ) and configuration used to start the nodes.
Thank you! 💜
Hello, the same problem after downgrade. Regular restart needed attached log and config config_bor.txt out_bor.log
Hello,
I have the same issue. The bor node is stuck at block number 52962568.
bor v1.1.0
heimdall v1.0.3
I tried to restart the bor node, but it took a long time to try to stop.
Finally, it was killed by systemd for 'stop-sigterm' timed out.
After starting, the block number rolls back to 52921882, it far away from the stuck block number 52962568.
Same here.
@CaCaBlocker You can ignore these logs for now as your node is not completely synced.
@RyanWang0811 Is it working now?
It is working now. thx.
Hi, still have this problem, I restart bor 3-5 times per day
Still have this problem, too.
This issue is like what I posted previously and the issue seems not to have been repaired or still has any issue. https://github.com/maticnetwork/bor/issues/939
Is it a node bug? or any issue on the chain?
Problem still actual, two nodes with different bor versions struggle
Hey @RyanWang0811 @VSGic what specific errors are you facing currently ? Can you share some logs ?
Also have you upgraded to bor v1.2.3
?
Hello @Raneet10 I have posted logs above here. I have two nodes, one with bor v1.2.3 , and it have the same problem
I encountered this problem using the latest version on the testnet, and there is no solution yet。heimdall:v1.0.4-beta,bor:v1.2.6-beta
Hello !
Just wanted to mention that we are experiencing the same issues with our 2 polygon bor nodes. I have setup a liveness probe (k8s) to restart the node if it get stuck for more than 15 min. It kinda work but it’s really annoying and we still manage to have small interruptions when both nodes get stuck at the same moment. It happens multiple times per day. It’s really bad.
Anything planned to fix those issues ?
By the way I compared the errors I got in Heimdall and bor logs while it was stuck on a block to the logs I had on the other node that was working. And I found exactly the same error in both. So the issue for sure is not being logged...
Hello, still have this trouble, we cannot send transactions with such node. They are get lost, when node out of sync. We work with polygon in manual regime
Hi, still actual, and become worse, one node even cannot get synced after reboot and stucks on the way again
Also faced this issue when bootstraping node from official snapshot. Seems that removing nodekey
file fixed that problem and sync is now progressing.
After update to 1.2.7 problem still exists, bor loses sync from blockchain at accidental moment and only reboot pushes it ti start sync from the stuck block, but then it repeats. Removing nodekey did not help
Hey, it would be really helpful if we can get the stack trace to see where the bor process is stuck and navigate the root cause. You can get it by the below 2 ways.
bor attach <path-to-bor.ipc> --exec "debug.stacks()" > stacktrace.txt
kill -QUIT <pid of bor>
and the logs should have the stack traceCould you help us with the same? Thanks!
Hi, @manav2401 See attached files. stacktrace-4 is from bor 1.2.7 stacktrace-8 is from bor 1.2.3
Hello, my current bor client is also stuck in a certain block. The block is 54875999. I rolled back the bor client blocks to 500, 2000, and 15000 and still the problem has not been solved. My client situation is as follows: bor version 1.1.0 and heimdall v1.0.3 The situation of heimdall is as follows: The situation of bor client is as follows: When the bor client just started, it kept looking for peer nodes. After that, the following situation would always occur, and the data could not be synchronized forward. I tried rolling back the bor client: $ bor attach ~/.bor/data/bor.ipc`
debug.setHead("0x10250E8") I tried going back 500, 1000, and 15000 blocks, but the final data block was stuck at block 54875999, and the data could not be synchronized forward. @manav2401 @VAIBHAVJINDAL3012
Hey guys! bor is stuck in this state now:
Head state missing, repairing number=55,665,375 hash=3510a2..495a1a snaproot=20d014..eff4a3
Hey I think had the same issue, but after trying to fix it, I'm pretty sure I broke my database (it is not starting anymore). I need to download the bor snapshot, but the last image is from February. Is there another link, more recent than this one, to download polygon snapshots?
I can confirm I’m facing the same issue as described here. I’m running bor v1.3.0-beta-2. Restarting bor seems to solve the problem for a little while but then it stops getting blocks after some time. I’ve yet to rule out potential networking problems. I’m currently based in a home network with port forwarding enabled for p2p ports 30301 , 30303, 26656. it has a very low peer count and that doesn’t seem to improve.
If anyone has any more recent updates could they post them here with things they’ve tried.
Hi,The number of my peer nodes has always been at a very low level, and the quality of the peer nodes found is very poor, so the data cannot be synchronized. Is there any solution?
I turned the verbosity up on bor to 4 and restarted. It sync'd to head, and then stopped sync'ing. I saw this in the logs:
Apr 12 21:34:46 polygon bor[201293]: INFO [04-12|21:34:46.257] Imported new chain segment number=55,751,326 hash=638ef5..090cf3 blocks=4 txs=731 mgas=82.068 elapsed=505.778ms mgasps=162.260 dirty=1022.42MiB
Apr 12 21:34:46 polygon bor[201293]: DEBUG[04-12|21:34:46.257] Inserted new block number=55,751,326 hash=638ef5..090cf3 uncles=0 txs=100 gas=8,560,433 elapsed=127.348ms root=d1f976..f6b459
Apr 12 21:34:46 polygon bor[201293]: DEBUG[04-12|21:34:46.257] Synchronisation terminated elapsed=1m50.056s
Apr 12 21:34:46 polygon bor[201293]: DEBUG[04-12|21:34:46.263] Unindexed transactions blocks=4 txs=357 tail=53,401,327 elapsed=3.014ms
Apr 12 21:34:46 polygon bor[201293]: DEBUG[04-12|21:34:46.268] Reinjecting stale transactions count=0
Apr 12 21:34:47 polygon bor[201293]: DEBUG[04-12|21:34:47.847] Replaced dead node b=8 id=5758c487f99db11c ip=18.171.122.44 checks=0 r=573e27515b6173a4 rip=131.153.232.46
Apr 12 21:34:49 polygon bor[201293]: DEBUG[04-12|21:34:49.935] Deep froze chain segment blocks=384 elapsed=241.398ms number=55,661,326 hash=bfd3ef..6cc77f
Apr 12 21:34:50 polygon bor[201293]: INFO [04-12|21:34:50.660] Got new milestone from heimdall start=55,750,814 end=55,750,889 hash=0xe317b3273f7b5ba3db2435a4ae7ec3f56a93fb56c5af36d63d6d8142fbf9b736
Apr 12 21:34:53 polygon bor[201293]: DEBUG[04-12|21:34:53.082] Revalidated node b=11 id=52a251811399e9f1 checks=1
Apr 12 21:34:54 polygon bor[201293]: DEBUG[04-12|21:34:54.984] Revalidated node b=16 id=f4edb64c1c31a642 checks=1
Apr 12 21:34:56 polygon bor[201293]: DEBUG[04-12|21:34:56.409] Revalidated node b=8 id=573e27515b6173a4 checks=1
Apr 12 21:34:59 polygon bor[201293]: DEBUG[04-12|21:34:59.355] Served eth_getBlockByNumber conn=[127.0.0.1:39276](http://127.0.0.1:39276/) reqid=357 duration="92.069µs"
Apr 12 21:35:02 polygon bor[201293]: INFO [04-12|21:35:02.660] Got new milestone from heimdall start=55,750,814 end=55,750,889 hash=0xe317b3273f7b5ba3db2435a4ae7ec3f56a93fb56c5af36d63d6d8142fbf9b736
Apr 12 21:35:03 polygon bor[201293]: DEBUG[04-12|21:35:03.367] Revalidated node b=6 id=579eab95009792f1 checks=2
Apr 12 21:35:05 polygon bor[201293]: DEBUG[04-12|21:35:05.202] Revalidated node b=5 id=57b3055bdd011323 checks=4
Apr 12 21:35:11 polygon bor[201293]: DEBUG[04-12|21:35:11.304] RPC connection read error err=EOF
After another restart i got a bunch of problems with ip table limit:
Apr 12 21:56:01 polygon bor[201683]: DEBUG[04-12|21:56:01.585] IP exceeds table limit ip=65.21.164.117
Apr 12 21:56:01 polygon bor[201683]: DEBUG[04-12|21:56:01.645] IP exceeds table limit ip=148.251.142.58
Apr 12 21:56:01 polygon bor[201683]: DEBUG[04-12|21:56:01.645] IP exceeds table limit ip=148.251.142.59
Apr 12 21:56:01 polygon bor[201683]: DEBUG[04-12|21:56:01.645] IP exceeds table limit ip=65.21.164.126
Apr 12 21:56:01 polygon bor[201683]: DEBUG[04-12|21:56:01.680] IP exceeds table limit ip=148.251.142.68
Apr 12 21:56:01 polygon bor[201683]: DEBUG[04-12|21:56:01.680] IP exceeds table limit ip=65.21.164.113
Apr 12 21:56:01 polygon bor[201683]: DEBUG[04-12|21:56:01.680] IP exceeds table limit ip=65.21.164.117
Apr 12 21:56:02 polygon bor[201683]: DEBUG[04-12|21:56:02.403] IP exceeds table limit ip=148.251.142.58
Additionally found this post on geth https://github.com/ethereum/go-ethereum/issues/1563 where peer count is low and getting accepted by other peers don't seem to occur easily.
Interesting even though my blocks aren't coming through I do get the occassional transaction appear. So obviously some connectivity is still occurring. Where do blocks come from? I thought bor was the recipient of blocks, but is it actually heimdall?
Ok randomly last night after yes another restart I was able to get blocks through steadily and it hasn't stopped since. The last change I made was to increase the maxpeers=200 in config.toml for Bor. I don't know exactly why this might have been the desired change, but it appears to have worked. I heavily suspect the actual problem is related to getting a steady and useful peer count. I still only have 37 peers which seems extremely low and the number doesn't appear to be going up.
Ok randomly last night after yes another restart I was able to get blocks through steadily and it hasn't stopped since. The last change I made was to increase the maxpeers=200 in config.toml for Bor. I don't know exactly why this might have been the desired change, but it appears to have worked. I heavily suspect the actual problem is related to getting a steady and useful peer count. I still only have 37 peers which seems extremely low and the number doesn't appear to be going up.
Really thanks, nice try; Mine stuck and not sync even with manual restart. Followed your setting, seems fine now
Hi everyone. updating setting to maxpeers=200 did not help for me. Still stucks.
I'm reasonably confident it's a problem with (a) having decent peers (b) having connectivity to those peers and discoverability
Firstly check some things:
1) you have correct ports exposed to the internet (and if you're behind a NAT make sure to port forward 26656 and 30303)
2) once bor is started, do sudo bor attach bor.ipc
3) then at prompt do admin.peers.forEach(p => console.log(p.enode))
4) take all these nodes that you have a connection with and edit your config.toml to include those are bootnodes and static-nodes
5) restart bor sudo service bor restart
I found that once I had set up correct port exposure and enough peers it seems to continue to work ok.
Alternatively you can just utilise my list of peers:
enode://f422a5032462e3f1e77a2e5943fb5c1dfa1cf43504d7256a51a8db1fdfb1761abbf78031938b3ba0f4791772fc31107b0230cd8cd6dc9a72b181cf11b337f8e2@131.153.232.37:30329
enode://b6b12c778b02da773ab6e53ff3ed801deceee77d55359234250da276b363f59e28e24b7e1c445e0b58e07aa974e380b6666524c51e09bd45aa5da288459ef518@141.95.35.53:30303?discport=30395
enode://8a4f4600aa8bde4419a250d2137bf779d5fc445fd3e541fa340d267410109e8e70e9b620efcf8f69e0249a5644c07645d52e16e8573cb635638dbe25ccf63329@131.153.238.130:30310
enode://60bcf1e0b04bfa9dfa7223959705eca089fa62b8634c30b77de7210da0b4b3c94a83213cd0fac8db05995d0e614f89b0a768198f134e5cd2157652cba4985962@79.127.216.33:30303?discport=30416
enode://8d00e8e93bd02839850e081e97200587caa3b5b485716bc78115035aece4bad36fd7eeace9501a4a5259a03cca1137319dd772fd7e3dc8c0c7d7c79220e0b4b2@52.4.6.33:30306
enode://94af7827b09d190594db5971bb691b060d3078907be2a3dfb35db066ef921260adf1f510c0e799f2c2e1011790e23e35f77a801a0a5aef8c4ed4896b88b54725@144.76.143.137:30311
enode://854bf9f4520f42eace23d3b12a18194a52303e678f6f01773448b3fad8276dcf78ffbbee39b63a4def0141b0aea8852053536c6fb7b300ea15ed1956a6d840f6@178.63.197.235:30322
enode://b80a035437ff33fe6e0c44b6e5ff130fce595dc0065a5908cf1de392ddf6ff1689b0feb6eeed74bd1cb0f20622e22b0094c76ffaa4d2a4c9bd99abc76ca3d4d5@167.235.187.6:30306
enode://d04a33070702a98e9084f44d5188609c8b2bef61fd47a3d8bd80da3a7f256384af5298a7d4aec844598c39c4865a9938fdc2f3a6871076891791c4cc1856e425@47.128.191.46:30309
enode://41565801b34abe50e5085a7267248684148f1d4b41d9aef38edfeef3fb2703434bc5b8d44ca98263914f840fe8566806c044bec21c1c408f69fbb2a5314a692e@34.205.208.100:30304
enode://274d63a16720bec5a13bb4d5115f64ea2806104f3b77634d2b91d9298e4c3a432168b4e2270cf8bdeb4305c0fe78af91530ac2a9117e0ab3053bace53ce8c8d6@178.63.148.11:30315
enode://735a8029b30f839315d38ff01d5d02ddd5f772d59ec7a0c70b127a8d49791094b1204d01bef9f733c51568499018df647c25e620133b75dbb3bc1412e5b5255b@95.216.127.78:30303
enode://56a40d77ac767dfe9613034530879f60c13b20925ca6b1e384ac7166cb6b7a5def5960b854439875f5f14267530811bcb51b5ee93b38e38f9537642bc9e85cfe@44.207.75.52:30303
enode://a09fb28e22e7d9132359662c9e340dc98682eb98a7658e3aac65be901bea966ca4435d05167d9b065bf8db79a9b220160ef9f241a1e4a0a709853c268082cba4@44.242.14.73:30313
enode://0ff046742976c2afe0aa8bf4c1956eeccb8cb220bda687aab7393a634ee4681318d4ac23a376520cf62b29cfdd3f8de3e8ed274637b645f111ce8a5dd21bb78d@52.195.35.130:30310
enode://2ad3521be3c7528080e614827eb46ffc75d25e1ead07f40539c58bf534a195c9f3d3b493c29316d9b6efdfc4fa3a1d424ba54e3e89df9ae3c3f2f56626ef026c@131.153.238.118:30504
enode://376a0c86dd56c012199402d8636174fe9f565567fb6cb45c55dd50f9af147e3c77cfadc41325b940094a583aede5578cd91bd7fbb2d9516bb1e47f33a709f3fa@88.198.99.100:30304
enode://ac1ad52dca196112693ff95ff1c550a4402e6a0d0117661ef0dc2c49e15e4046b9a3c5388a5e506bbfbbcd49cbb384de7018f6d982f01b3cc1511353d77340bf@18.219.134.46:30318
enode://bedbd7199f27fa184f9b791c6800725069d21ec76e95760877810c440c0252a1f0234345feb6d9fccbe9c61da3dd92e143d72ace0c4fff931a29bfdb85f68037@52.49.125.50:30303
enode://49c9f3a6f4a1c0cf526b1cb95e61cb389f6dd63a39f70cdb53e1072d290e9aedfda1370b69a13f0266ce2ea9c76f93e66d2e504b30c72f1432ee7d3ee07fc67f@3.120.215.154:30318
enode://15421dfe508c8bcaf4df0d5cb7b70424c283aa66875b778d482c938efd796314cb018246cf5cfe473f745f49b985afc5dcafcdaad333c788ba4b97e357c51853@135.181.236.61:30303?discport=30498
enode://e83278b3f8c8f0ed6b7c0d6311549e29e0b674b87cbf60544fba244ec3248360b3f3b0e087ad41527151fd255b75a16ef3e2444145d0e9e4377796149029c50c@141.95.31.164:30303
enode://50219d6315852398feaf9762d292daa9367d281ef6b48fe0b9ecc7075466a23cc36dadb58765fbd127d04b5bc2325a41437934db22110c14b557e77a0094a4a7@52.195.48.20:30319
enode://2dd198c54ed9bb191fed772a391e5b777aaa7145ced7e6dd0e2de816bd3527ba711bbb4d02fd43515e652209ef1dbb7c8cd577a5f486adda09aadbc1291b6cb3@3.9.224.239:30324
enode://455186fa23714d4a40977784f044f146d8062dd9c65849c9bda0040fdd700a7655ddaf892ca81e8602dd1a08e19f7e3c14dbe80af16823f867577a937591de96@148.251.142.70:30314
enode://b75bce4d1a3e3460790e13a46253674314a3a5d9161ebd1cf3ef5212a5d4cf332bd6973e5d7856c24e15e17aa7adafafb89c303c7f8b7813b37718471f03a099@3.8.45.159:30317
enode://d1e05cd722d5c5734a55049065d67c81545e0e5b46f2338734737d3817c5a9b1c0c0baac401ef169398ea50e68c798b7438bc622eb2d2c58b8f2398342e8be07@35.169.31.203:30303?discport=30371
enode://1825021e6229a8acddaaa28c16b5a33dcadd6e3153c53ce8dc1f0d7dc8abc5422e98ffe56bfcae6d1e2ce8a0325ce1666ec743aebd8aff3b20799a87e6c467eb@54.38.76.225:30303?discport=30313
enode://185771dc3a43b298b2da6d9ac26801d80d754111e63689b549b002dc43b096455ff6a7ccb7483ffd112bdbe746b37d49716c2787919bda1bfe111aef36bb9a96@148.251.194.252:30303?discport=30548
enode://1c3a5773431e82a1f5a2325b5475ee2ddff84a489480c977280898ffe3a6fdee85b480e5f7cb0a4bf88881cddd73a2abac1542c191ca51fd2bbd271a69eedf28@87.249.137.89:30303?discport=30330
enode://f7e003951d9eee96bf460065ef0018a9e758e1dd9a14e13add07b0fb6a22369779e112e6c2b4ebd30f5f4a37a2013976b2bd75a54ad6176269011d2dd9589785@84.17.38.164:30303?discport=30523
enode://ad0e380cb6d23f6126350c7c03f3a35177d6e428436f605e633b96f5437f910b1ea93e4462e612a5c9af116d09eda1ceae1a3429f48f269173262d9d828ed924@107.6.141.6:30309
enode://9d79af7012353cf5c0f1f48549327ee3e9548ecc658beccb79ec595e1e0dde2d2540de11ca19236a047813aaebd74181646c3039e086348492f0e46801562489@51.21.125.132:30313
enode://188c1273f9f25cad8dc26040e252c2e8b92c1ddf79f7f659fcb59b65f1b222d1c14a116853c30fc66cc894b7125c4915d002209067d1a7d86d64f495b265e293@13.125.0.203:30307
enode://a341923aa8d22f2018106cabcdb22b14a271701ea80b06cd7737c1bc299ac42e06387f5381eeebe67f2517b4f11a1b464adb2465bf9eebc1abf52d0e3eb8b9dd@13.57.125.97:30320
Put these in your bootnodes/static-nodes and hopefully you will get running. Periodically i check the set of connected nodes I have and log them, then utilise those as bootnodes in the hope i can find more.
@james-turner Thank you for explanations, I have followed your instruction, it did not help finnally, but behavior of node have changed. Now there is no stucking, but irregular rising and going down lag from the blockchain blocks.
Ok randomly last night after yes another restart I was able to get blocks through steadily and it hasn't stopped since. The last change I made was to increase the maxpeers=200 in config.toml for Bor. I don't know exactly why this might have been the desired change, but it appears to have worked. I heavily suspect the actual problem is related to getting a steady and useful peer count. I still only have 37 peers which seems extremely low and the number doesn't appear to be going up.
Really thanks, nice try; Mine stuck and not sync even with manual restart. Followed your setting, seems fine now
Well, it survived for about two weeks and then stuck again, syncing for a while and stuck repeatly. Tried the above solution of listing peers(plus some from site: https://polygonscan.com/nodetracker/nodes) in the config file, not work. Also tried to re-download snapdata for bor, which falls pretty far behind to Feb, still stucks So now, my two full nodes of polygon are dying and I cannot help
Hey all,
We are no longer providing snapshots for the community. Instead, we have transitioned to a community-driven model where snapshots are provided by some of the most active members. These include the following validators: Vault Staking (Mainnet/Mumbai), Stakepool (Mainnet/Amoy), StakeCraft (Mainnet/Mumbai/Erigon Archive) and Girnaar Nodes (Amoy).
Also, StakeCraft has introduced a new service for the Polygon community - All4nodes.io aggregator service where snapshots from all different providers can be found. More details here: https://forum.polygon.technology/t/stakecraft-introduces-a-new-service-for-polygon-community-all4nodes-io-aggregator-service/13694/1
This decision has been made to foster greater community involvement and to distribute responsibilities more equitably among our dedicated community members. Empowering the community to generate snapshots will not only ensure their timely availability but also promote collaboration and engagement within our community.
For inquiries, contact community services directly.
Regards, Team Polygon Labs
So the problem to download fast snapshot for bor is secondary now, more serious is bor regular stuck, because it is not possible to provide transactions with such nodes and this is affects business
This issue is stale because it has been open 14 days with no activity. Remove stale label or comment or this will be closed in 14 days.
There is other issues about this problem, so I think this is still actual
System information
Bor client version: 1.2.1
Heimdall client version: 1.0.3
OS & Version: Linux
Environment: Polygon Mainnet
Type of node: Full
Overview of the problem
I am running a full node using bor and heimdall via docker the last 2 months but seems that the bor sync stucks 11h ago at block 0x312d050. I am getting following logs from bor docker image:
Any idea how can i fix it? I tried to restart docker image but the error remains.