maticnetwork / bor

Official repository for the Polygon Blockchain
https://polygon.technology/
GNU Lesser General Public License v3.0
1k stars 491 forks source link

Bor sync issue #1239

Closed charleswong1025 closed 1 month ago

charleswong1025 commented 4 months ago

This is bor and heimdalld version

[root@polygon-mainnet-0 bor]# bor version Version: 1.3.1 GitCommit: [root@polygon-mainnet-0 bor]# heimdalld version 1.0.5

This is bor and heimdalld config file Bor: [root@polygon-mainnet-0 node]# cat bor/config/config.toml | grep -v "#" | grep -v ^$ chain = "mainnet" datadir = "/polygon/node/bor/data" syncmode = "full" [p2p] maxpeers = 50 port = 30303 nodiscover = true [p2p.discovery] bootnodes = ["enode://b8f1cc9c5d4403703fbf377116469667d2b1823c0daf16b7250aa576bacf399e42c3930ccfcb02c5df6879565a2b8931335565f0e8d3f8e72385ecf4a4bf160a@3.36.224.80:30303", "enode://8729e0c825f3d9cad382555f3e46dcff21af323e89025a0e6312df541f4a9e73abfa562d64906f5e59c51fe6f0501b3e61b07979606c56329c020ed739910759@54.194.245.5:30303"] [txpool] nolocals = true pricelimit = 30000000000 accountslots = 16 globalslots = 32768 accountqueue = 16 globalqueue = 32768 lifetime = "1h30m0s" [miner] gaslimit = 30000000 gasprice = "30000000000" [jsonrpc] ipcpath = "/polygon/node/bor/bor.ipc" [jsonrpc.http] enabled = true port = 8545 host = "0.0.0.0" api = ["eth", "net", "web3", "txpool", "bor"] vhosts = [""] corsdomain = [""] [gpo] ignoreprice = "30000000000" [telemetry] metrics = true [cache] cache = 4096 [accounts] allow-insecure-unlock = true

Heimdall: [root@polygon-mainnet-0 node]# cat heimdall/config/config.toml | grep -v "#" | grep -v ^$ proxy_app = "tcp://127.0.0.1:26658" moniker = "polygon-mainnet-planx-0" fast_sync = true db_backend = "goleveldb" db_dir = "data" log_level = "main:info,state:info,*:error" log_format = "plain" genesis_file = "config/genesis.json" priv_validator_key_file = "config/priv_validator_key.json" priv_validator_state_file = "data/priv_validator_state.json" priv_validator_laddr = "" node_key_file = "config/node_key.json" abci = "socket" prof_laddr = "localhost:6060" filter_peers = false [rpc] laddr = "tcp://127.0.0.1:26657" cors_allowed_origins = [] cors_allowed_methods = ["HEAD", "GET", "POST", ] cors_allowed_headers = ["Origin", "Accept", "Content-Type", "X-Requested-With", "X-Server-Time", ] grpc_laddr = "" grpc_max_open_connections = 900 unsafe = false max_open_connections = 900 max_subscription_clients = 100 max_subscriptions_per_client = 5 timeout_broadcast_tx_commit = "10s" max_body_bytes = 1000000 max_header_bytes = 1048576 tls_cert_file = "" tls_key_file = "" [p2p] laddr = "tcp://0.0.0.0:26656" external_address = "" seeds = "1500161dd491b67fb1ac81868952be49e2509c9f@52.78.36.216:26656,dd4a3f1750af5765266231b9d8ac764599921736@3.36.224.80:26656,8ea4f592ad6cc38d7532aff418d1fb97052463af@34.240.245.39:26656,e772e1fb8c3492a9570a377a5eafdb1dc53cd778@54.194.245.5:26656,6726b826df45ac8e9afb4bdb2469c7771bd797f1@52.209.21.164:26656" persistent_peers = "" upnp = false addr_book_file = "config/addrbook.json" addr_book_strict = true max_num_inbound_peers = 100 max_num_outbound_peers = 100 flush_throttle_timeout = "100ms" max_packet_msg_payload_size = 1024 send_rate = 5120000 recv_rate = 5120000 pex = true seed_mode = false private_peer_ids = "" allow_duplicate_ip = false handshake_timeout = "20s" dial_timeout = "3s" [mempool] recheck = true broadcast = true wal_dir = "" size = 5000 max_txs_bytes = 1073741824 cache_size = 10000 max_tx_bytes = 1048576 [fastsync] version = "v0" [consensus] wal_file = "data/cs.wal/wal" timeout_propose = "3s" timeout_propose_delta = "500ms" timeout_prevote = "1s" timeout_prevote_delta = "500ms" timeout_precommit = "1s" timeout_precommit_delta = "500ms" timeout_commit = "5s" skip_timeout_commit = false create_empty_blocks = true create_empty_blocks_interval = "0s" peer_gossip_sleep_duration = "100ms" peer_query_maj23_sleep_duration = "2s" [tx_index] indexer = "kv" index_tags = "" index_all_tags = true [instrumentation] prometheus = true prometheus_listen_addr = ":26660" max_open_connections = 0 namespace = "tendermint"

Issue: heimdall sync normal, log like this: [root@polygon-mainnet-0 node]# journalctl -f -u heimdalld -- Logs begin at Sun 2024-05-05 13:29:52 UTC. -- May 05 15:51:13 polygon-mainnet-0 heimdalld[2460]: INFO [2024-05-05|15:51:13.574] Executed block module=state height=18651561 validTxs=0 invalidTxs=0 May 05 15:51:13 polygon-mainnet-0 heimdalld[2460]: INFO [2024-05-05|15:51:13.586] Committed state module=state height=18651561 txs=0 appHash=F573B8921DF819BBCE6F05F36638A0EE1A896F230FBB92B6501608A8D4577545 May 05 15:51:17 polygon-mainnet-0 heimdalld[2460]: INFO [2024-05-05|15:51:17.602] Served RPC HTTP response module=rest-server method=GET url=/milestone/lastNoAck status=200 duration=1 remoteAddr=[::1]:35548 May 05 15:51:19 polygon-mainnet-0 heimdalld[2460]: INFO [2024-05-05|15:51:19.962] Executed block module=state height=18651562 validTxs=0 invalidTxs=0 May 05 15:51:19 polygon-mainnet-0 heimdalld[2460]: INFO [2024-05-05|15:51:19.974] Committed state module=state height=18651562 txs=0 appHash=F573B8921DF819BBCE6F05F36638A0EE1A896F230FBB92B6501608A8D4577545 May 05 15:51:23 polygon-mainnet-0 heimdalld[2460]: INFO [2024-05-05|15:51:23.602] Served RPC HTTP response module=rest-server method=GET url=/milestone/lastNoAck status=200 duration=1 remoteAddr=[::1]:35548 May 05 15:51:23 polygon-mainnet-0 heimdalld[2460]: INFO [2024-05-05|15:51:23.602] Served RPC HTTP response module=rest-server method=GET url=/milestone/latest status=200 duration=0 remoteAddr=[::1]:35554 May 05 15:51:25 polygon-mainnet-0 heimdalld[2460]: ERROR[2024-05-05|15:51:25.529] enterPrevote: ProposalBlock is invalid module=consensus height=18651563 round=0 err="Invalid evidence: Evidence from height 17549734 is too old. Min height is 18551562. Evidence: VoteA: Vote{65:9E9758A9D38E 17549734/00/2(Precommit) 000000000000 F43A24E9BA52 @ 2024-02-13T00:52:25.232425984Z [no-proposals]}; VoteB: Vote{65:9E9758A9D38E 17549734/00/2(Precommit) 2B54019ECD53 F65C1AA06DED @ 2024-02-13T00:52:41.709266477Z [no-proposals]}" May 05 15:51:28 polygon-mainnet-0 heimdalld[2460]: INFO [2024-05-05|15:51:28.659] Executed block module=state height=18651563 validTxs=1 invalidTxs=0 May 05 15:51:28 polygon-mainnet-0 heimdalld[2460]: INFO [2024-05-05|15:51:28.673] Committed state module=state height=18651563 txs=1 appHash=5269AF9EDD80E0687729A0FA5973C6E98DF30BE901E208CF27BB0EDD26928B89 May 05 15:51:29 polygon-mainnet-0 heimdalld[2460]: INFO [2024-05-05|15:51:29.602] Served RPC HTTP response module=rest-server method=GET url=/milestone/lastNoAck status=200 duration=0 remoteAddr=[::1]:35554 May 05 15:51:30 polygon-mainnet-0 heimdalld[2460]: ERROR[2024-05-05|15:51:30.262] Connection failed @ recvRoutine (reading byte) module=p2p peer=eb9a9a6cbdf9d548e41c09fd41c0fc242c148647@188.40.21.73:26656 conn=MConn{188.40.21.73:26656} err="read tcp 192.168.0.235:47404->188.40.21.73:26656: read: connection reset by peer" May 05 15:51:34 polygon-mainnet-0 heimdalld[2460]: INFO [2024-05-05|15:51:34.774] Executed block module=state height=18651564 validTxs=0 invalidTxs=0 May 05 15:51:34 polygon-mainnet-0 heimdalld[2460]: INFO [2024-05-05|15:51:34.786] Committed state module=state height=18651564 txs=0 appHash=5269AF9EDD80E0687729A0FA5973C6E98DF30BE901E208CF27BB0EDD26928B89 May 05 15:51:35 polygon-mainnet-0 heimdalld[2460]: ERROR[2024-05-05|15:51:35.456] Connection failed @ recvRoutine (reading byte) module=p2p peer=8542cd7e6bf9d260fef543bc49e59be5a3fa9074@139.59.207.97:27656 conn=MConn{139.59.207.97:27656} err="read tcp 192.168.0.235:46946->139.59.207.97:27656: read: connection reset by peer" May 05 15:51:35 polygon-mainnet-0 heimdalld[2460]: INFO [2024-05-05|15:51:35.602] Served RPC HTTP response module=rest-server method=GET url=/milestone/latest status=200 duration=0 remoteAddr=[::1]:35554 May 05 15:51:35 polygon-mainnet-0 heimdalld[2460]: INFO [2024-05-05|15:51:35.602] Served RPC HTTP response module=rest-server method=GET url=/milestone/lastNoAck status=200 duration=0 remoteAddr=[::1]:35678 May 05 15:51:35 polygon-mainnet-0 heimdalld[2460]: ERROR[2024-05-05|15:51:35.983] Connection failed @ recvRoutine (reading byte) module=p2p peer=e339c7e9d435016e6aac41154c717d74f0348748@213.168.227.55:26656 conn=MConn{213.168.227.55:26656} err="read tcp 192.168.0.235:56166->213.168.227.55:26656: read: connection reset by peer"

[root@polygon-mainnet-0 polygon]# ./check_heimdalld_status.sh { "jsonrpc": "2.0", "id": "", "result": { "node_info": { "protocol_version": { "p2p": "7", "block": "10", "app": "0" }, "id": "1c4062321a01d7bdf2283ffc06739d612659ac62", "listen_addr": "tcp://0.0.0.0:26656", "network": "heimdall-137", "version": "0.32.7", "channels": "4020212223303800", "moniker": "polygon-mainnet-planx-0", "other": { "tx_index": "on", "rpc_address": "tcp://127.0.0.1:26657" } }, "sync_info": { "latest_block_hash": "2CE5601ED7D088EDF6169AEDF71800183F2F82B38D16B5F04C5ED13BF76DC174", "latest_app_hash": "6E527FC3D842D0F9637D4ED845F167AC25ADF66E9F717AEC07A22B75D1BAF7FA", "latest_block_height": "18651567", "latest_block_time": "2024-05-05T15:51:46.690845394Z", "catching_up": false }, "validator_info": { "address": "241E61AF9E04892D42A65CB6BDD83CDB24EB1227", "pub_key": { "type": "tendermint/PubKeySecp256k1", "value": "BC+FwvyAxdrkNHCgYBx8E1/Rp6cZwihiaBkDiGtAbMKjQf1P7RNJSYKxhFGXny2v471lMjWPJX2nye5YTP0m8+U=" }, "voting_power": "0" } } }

But bor : [root@polygon-mainnet-0 polygon]# journalctl -f -u bor -- Logs begin at Sun 2024-05-05 13:29:52 UTC. -- May 05 15:51:23 polygon-mainnet-0 bor[4768]: WARN [05-05|15:51:23.602] unable to handle whitelist milestone err="missing blocks" May 05 15:51:35 polygon-mainnet-0 bor[4768]: INFO [05-05|15:51:35.602] Got new milestone from heimdall start=56,611,446 end=56,611,464 hash=0xba93a42ccefd429dfe9445ed4cde701c0cfb6335c6a126d5e5531e54f2a87599 May 05 15:51:35 polygon-mainnet-0 bor[4768]: WARN [05-05|15:51:35.602] unable to handle whitelist milestone err="missing blocks" May 05 15:51:47 polygon-mainnet-0 bor[4768]: INFO [05-05|15:51:47.602] Got new milestone from heimdall start=56,611,465 end=56,611,489 hash=0xeb728d8d6cd6cb1082f9492ec17f1966cea7823ca09240650b7e7ccdb18491c2 May 05 15:51:47 polygon-mainnet-0 bor[4768]: WARN [05-05|15:51:47.602] unable to handle whitelist milestone err="missing blocks" May 05 15:51:51 polygon-mainnet-0 bor[4768]: INFO [05-05|15:51:51.617] Got new checkpoint from heimdall start=56,608,678 end=56,609,957 rootHash=0x915ec7ff9bea60306f59a53d6ca42ae33b384c0e44ff731e533e6db57a163618 May 05 15:51:59 polygon-mainnet-0 bor[4768]: INFO [05-05|15:51:59.602] Got new milestone from heimdall start=56,611,465 end=56,611,489 hash=0xeb728d8d6cd6cb1082f9492ec17f1966cea7823ca09240650b7e7ccdb18491c2 May 05 15:51:59 polygon-mainnet-0 bor[4768]: WARN [05-05|15:51:59.602] unable to handle whitelist milestone err="missing blocks" May 05 15:52:11 polygon-mainnet-0 bor[4768]: INFO [05-05|15:52:11.603] Got new milestone from heimdall start=56,611,465 end=56,611,489 hash=0xeb728d8d6cd6cb1082f9492ec17f1966cea7823ca09240650b7e7ccdb18491c2 May 05 15:52:11 polygon-mainnet-0 bor[4768]: WARN [05-05|15:52:11.603] unable to handle whitelist milestone err="missing blocks"

**Keep cycling this log, I've tried rebooting a couple of times and there's a little bit of syncing for a while but it stops after a while

This is Bor start log:** [root@polygon-mainnet-0 polygon]# journalctl -f -u bor -- Logs begin at Sun 2024-05-05 13:29:52 UTC. -- May 05 15:54:09 polygon-mainnet-0 systemd[1]: [/usr/lib/systemd/system/bor.service:3] Unknown lvalue 'StartLimitIntervalSec' in section 'Unit' May 05 15:54:09 polygon-mainnet-0 systemd[1]: [/usr/lib/systemd/system/bor.service:4] Unknown lvalue 'StartLimitBurst' in section 'Unit' May 05 15:54:09 polygon-mainnet-0 systemd[1]: Started bor. May 05 15:54:09 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:09.817] Reading config file path=/polygon/node/bor/config/config.toml May 05 15:54:09 polygon-mainnet-0 bor[5674]: WARN [05-05|15:54:09.817] Config set via config file will be overridden by cli flags May 05 15:54:09 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:09.818] GRPC Server started addr=[::]:3131 May 05 15:54:09 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:09.820] Set global gas cap cap=50,000,000 May 05 15:54:09 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:09.820] Allocated trie memory caches clean=614.00MiB dirty=1024.00MiB May 05 15:54:11 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:11.425] Using leveldb as the backing database May 05 15:54:11 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:11.426] Allocated cache and file handles database=/polygon/node/bor/data/bor/chaindata cache=2.00GiB handles=2048 compactionTableSize=0 compactionTableSizeMultiplier=0.000 compactionTotalSize=0 compactionTotalSizeMultiplier=0.000 May 05 15:54:24 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:24.467] Using LevelDB as the backing database May 05 15:54:24 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:24.467] Found legacy ancient chain path location=/polygon/node/bor/data/bor/chaindata/ancient May 05 15:54:24 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:24.514] Opened ancient database database=/polygon/node/bor/data/bor/chaindata/ancient readonly=false May 05 15:54:24 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:24.526] State scheme set by user scheme=hash May 05 15:54:42 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:42.543] Initialising Ethereum protocol network=137 dbversion=8 May 05 15:54:42 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:42.545] May 05 15:54:42 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:42.545] --------------------------------------------------------------------------------------------------------------------------------------------------------- May 05 15:54:42 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:42.545] Chain ID: 137 (bor) May 05 15:54:42 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:42.545] Consensus: Bor (proof-of-stake), merged from Ethash (proof-of-work) May 05 15:54:42 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:42.545] May 05 15:54:42 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:42.545] Pre-Merge hard forks (block based): May 05 15:54:42 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:42.545] - Homestead: #0 (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/homestead.md) May 05 15:54:42 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:42.545] - Tangerine Whistle (EIP 150): #0 (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/tangerine-whistle.md) May 05 15:54:42 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:42.545] - Spurious Dragon/1 (EIP 155): #0 (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/spurious-dragon.md) May 05 15:54:42 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:42.545] - Spurious Dragon/2 (EIP 158): #0 (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/spurious-dragon.md) May 05 15:54:42 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:42.545] - Byzantium: #0 (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/byzantium.md) May 05 15:54:42 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:42.545] - Constantinople: #0 (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/constantinople.md) May 05 15:54:42 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:42.545] - Petersburg: #0 (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/petersburg.md) May 05 15:54:42 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:42.545] - Istanbul: #3395000 (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/istanbul.md) May 05 15:54:42 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:42.545] - Muir Glacier: #3395000 (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/muir-glacier.md) May 05 15:54:42 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:42.545] - Berlin: #14750000 (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/berlin.md) May 05 15:54:42 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:42.545] - London: #23850000 (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/london.md) May 05 15:54:42 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:42.545] May 05 15:54:42 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:42.545] The Merge is not available for this network May 05 15:54:42 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:42.545] - Hard-fork specification: https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/paris.md May 05 15:54:42 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:42.545] May 05 15:54:42 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:42.545] Post-Merge hard forks (block based): May 05 15:54:42 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:42.545] - Shanghai: #50523000 (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/shanghai.md) May 05 15:54:42 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:42.545] - Cancun: #54876000 May 05 15:54:42 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:42.545] May 05 15:54:42 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:42.545] --------------------------------------------------------------------------------------------------------------------------------------------------------- May 05 15:54:42 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:42.545] May 05 15:54:42 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:42.546] Loaded most recent local block number=56,610,030 hash=89fc85..0f6d4b td=1,037,670,453 age=57m42s May 05 15:54:43 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:43.068] Initialized transaction indexer limit=2,350,000 May 05 15:54:43 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:43.070] Gasprice oracle is ignoring threshold set threshold=30,000,000,000 May 05 15:54:43 polygon-mainnet-0 bor[5674]: WARN [05-05|15:54:43.114] Old unclean shutdowns found count=162 May 05 15:54:43 polygon-mainnet-0 bor[5674]: WARN [05-05|15:54:43.114] Unclean shutdown detected booted=2023-09-21T00:26:41+0000 age=7mo2w3d May 05 15:54:43 polygon-mainnet-0 bor[5674]: WARN [05-05|15:54:43.114] Unclean shutdown detected booted=2023-09-24T00:24:19+0000 age=7mo2w15h May 05 15:54:43 polygon-mainnet-0 bor[5674]: WARN [05-05|15:54:43.114] Unclean shutdown detected booted=2023-10-01T02:43:45+0000 age=7mo1w13h May 05 15:54:43 polygon-mainnet-0 bor[5674]: WARN [05-05|15:54:43.114] Unclean shutdown detected booted=2023-10-11T21:49:28+0000 age=6mo3w5d May 05 15:54:43 polygon-mainnet-0 bor[5674]: WARN [05-05|15:54:43.114] Unclean shutdown detected booted=2023-10-13T15:46:58+0000 age=6mo3w4d May 05 15:54:43 polygon-mainnet-0 bor[5674]: WARN [05-05|15:54:43.114] Unclean shutdown detected booted=2023-11-03T04:05:08+0000 age=6mo4d11h May 05 15:54:43 polygon-mainnet-0 bor[5674]: WARN [05-05|15:54:43.114] Unclean shutdown detected booted=2023-11-14T22:08:02+0000 age=5mo3w1d May 05 15:54:43 polygon-mainnet-0 bor[5674]: WARN [05-05|15:54:43.114] Unclean shutdown detected booted=2023-11-27T15:25:30+0000 age=5mo1w3d May 05 15:54:43 polygon-mainnet-0 bor[5674]: WARN [05-05|15:54:43.114] Unclean shutdown detected booted=2023-12-22T18:53:33+0000 age=4mo2w21h May 05 15:54:43 polygon-mainnet-0 bor[5674]: WARN [05-05|15:54:43.114] Unclean shutdown detected booted=2023-12-28T16:28:54+0000 age=4mo1w1d May 05 15:54:43 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:43.114] Enabling metrics collection May 05 15:54:43 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:43.114] Enabling metrics export to prometheus path=http://127.0.0.1:7071/debug/metrics/prometheus May 05 15:54:43 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:43.114] Starting peer-to-peer node instance=bor/v1.3.1/linux-amd64/go1.22.1 May 05 15:54:43 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:43.124] IPC endpoint opened url=/polygon/node/bor/bor.ipc May 05 15:54:43 polygon-mainnet-0 bor[5674]: WARN [05-05|15:54:43.125] Sanitizing invalid HTTP read header timeout provided=0s updated=30s May 05 15:54:43 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:43.125] HTTP server started endpoint=[::]:8545 auth=false prefix= cors= vhosts= May 05 15:54:43 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:43.125] TxFetcher txArrivalWait=500ms May 05 15:54:43 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:43.125] New local node record seq=1,713,516,493,993 id=1ce24cf60b522fde ip=127.0.0.1 udp=0 tcp=30303 May 05 15:54:43 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:43.125] Started P2P networking self="enode://dd6d3b00f86f7b85a60aef7f40f07520f0680f7b3c0719ab1d262057748de0f169141c8d66dfcf1970fe453552d85a3b717f23dc8e914ed50df91c71046c1c22@127.0.0.1:30303?discport=0" May 05 15:54:43 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:43.130] Got new milestone from heimdall start=56,611,544 end=56,611,560 hash=0xbfb321e6cbda51dc43b383773697d2721f361e9fcf1009be32383ddd2e7b1190 May 05 15:54:43 polygon-mainnet-0 bor[5674]: WARN [05-05|15:54:43.130] unable to start the whitelist milestone service - first run err="missing blocks" May 05 15:54:43 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:43.131] Got new checkpoint from heimdall start=56,608,678 end=56,609,957 rootHash=0x915ec7ff9bea60306f59a53d6ca42ae33b384c0e44ff731e533e6db57a163618 May 05 15:54:55 polygon-mainnet-0 bor[5674]: INFO [05-05|15:54:55.132] Got new milestone from heimdall start=56,611,544 end=56,611,560 hash=0xbfb321e6cbda51dc43b383773697d2721f361e9fcf1009be32383ddd2e7b1190 May 05 15:54:55 polygon-mainnet-0 bor[5674]: WARN [05-05|15:54:55.132] unable to handle whitelist milestone err="missing blocks"

git-ljm commented 4 months ago

me too

avinashbo commented 4 months ago

This problem is spreading to more nodes and happens on all versions through 1.2.8 to 1.3.1

manav2401 commented 4 months ago

Hey folks, could you please upgrade your bor to the latest beta version v1.3.2-beta-2 and see if it helps. Also, it would be good if you can increase the peer count using the --maxpeers flag. Maybe set it to a higher value (e.g. 200). Could you also set the nodiscover flag to false? I think that might also help. Thanks!

agrevtsev commented 4 months ago

@avinashbo also 1.3.2 affected. And it looks like node stop syncing after the first failure to query heimdall. Reproduced when used self-hosted and public heimdall. For me it looks like

INFO [05-12|02:23:07.294] Got new milestone from heimdall          start=56,864,736 end=56,864,751 hash=0xe4edf6949e3b5175ff40285e1707da26823e22f53ae38a1159e0f44a1197a846
INFO [05-12|02:23:07.631] Got new checkpoint from heimdall         start=56,862,886 end=56,863,397 rootHash=0x8b7498882c197291ee77bd95a01d7e6249586c514f80c28877f0289996a12dd6
WARN [05-12|02:23:17.197] an error while trying fetching from Heimdall path=/milestone/lastNoAck                    attempt=1 error="Get \"https://heimdall-api.polygon.technology/milestone/lastNoAck\": cont
ext deadline exceeded"
INFO [05-12|02:23:17.197] Retrying again in 5 seconds to fetch data from Heimdall path=/milestone/lastNoAck                    attempt=1
ERROR[05-12|02:23:17.197] Failed to fetch latest no-ack milestone  err="context deadline exceeded"
WARN [05-12|02:23:17.197] unable to handle no-ack-milestone service err="failed to fetch latest milestone"
INFO [05-12|02:23:19.119] Got new milestone from heimdall          start=56,864,752 end=56,864,779 hash=0x48d31cf79630b831ccd5f0a8fabb3edda3bcc07c75e95480ae6596e4158d5224
WARN [05-12|02:23:19.119] unable to handle whitelist milestone     err="missing blocks"
INFO [05-12|02:23:31.323] Got new milestone from heimdall          start=56,864,752 end=56,864,779 hash=0x48d31cf79630b831ccd5f0a8fabb3edda3bcc07c75e95480ae6596e4158d5224
WARN [05-12|02:23:31.323] unable to handle whitelist milestone     err="missing blocks"
INFO [05-12|02:23:43.308] Got new milestone from heimdall          start=56,864,752 end=56,864,779 hash=0x48d31cf79630b831ccd5f0a8fabb3edda3bcc07c75e95480ae6596e4158d5224
WARN [05-12|02:23:43.308] unable to handle whitelist milestone     err="missing blocks"

and it would stay in such state until node rebooted. @manav2401 i tried

  1. upgrade to v1.3.2
  2. use https://heimdall-api.polygon.technology/ instead of our homebrew
  3. bump --maxpeers to 200

still no luck Br, Alex

charleswong1025 commented 4 months ago

yes , same with me . no way

timweri commented 4 months ago

Same here. 30% CPU utilization for weeks of node being synced to date. Then once in a while (every month), the node falls behind with sustained 90% CPU utilization and the same bor logs as above.

The node AWS instance type is c7g.8xlarge so it's basically double the recommended specs.

mateusz992 commented 4 months ago

Resetting the node key leads to the same issue. Maybe there is something wrong with bootnodes? There is no way to setup and sync a new node.

Btw. I found that adding some peers as static fix things a bit.

zyx0355 commented 4 months ago

Resetting the node key leads to the same issue. Maybe there is something wrong with bootnodes? There is no way to setup and sync a new node.重置节点键会导致同样的问题。可能是bootnode有问题?无法设置和同步新节点。

Btw. I found that adding some peers as static fix things a bit.btw.我发现添加一些对等点作为静态修复了一些事情。

@mateusz992 Hi, can you share your peer-to-peer node?

zhangxf55 commented 4 months ago

I use bor 1.3.2 and have the same problem

github-actions[bot] commented 3 months ago

This issue is stale because it has been open 14 days with no activity. Remove stale label or comment or this will be closed in 14 days.

sammy007 commented 3 months ago

This issue is stale because it has been open 14 days with no activity. Remove stale label or comment or this will be closed in 14 days.

The issue is stale because everybody gave up in it. Abandonware.

manav2401 commented 3 months ago

Hey folks, the issues we're seeing are broadly classified into 2 categories. One is those who aren't getting any healthy peers and another is those whose nodes stop syncing randomly despite having good peers.

For the former issue, they're not easily reproducible as all our internal nodes are working well and a lot of partner nodes are also working well. I'd suggest to look into the node config and add more static/trusted peers if possible. The logs of missing blocks which you're seeing in bor is not the cause of syncing issues but it's just an indicator. It's not possible to debug just using this info so please share debug/trace logs in this issue. We have added a new rpc/ipc call debug_peerStats/debug.peerStats() to get info about the peers your node is connected with. Please share the result of this if possible.

For the latter, please upgrade to bor v1.3.3-beta3 as it contains a potential fix.

Thanks!

marcello33 commented 3 months ago

v1.3.3 has been released as a stable tag yesterday, in case you folks want to update. We'll keep working on some improvements and potentially bugs fixing wrt sync problems in the upcoming v1.3.4 Thanks for your patience

insider89 commented 3 months ago

Upgraded to 1.3.3, but still see unable to handle whitelist milestone err="missing blocks". Lag constantly increasing until I restart the pod. After restart it syncing to the latest block. Another pod with same configuration works without issue. Don't understand how could it possible Peers for the pod with issue:

curl -X POST -H "Content-Type: application/json" --data '{"jsonrpc":"2.0","method":"debug_peerStats","params":[],"id":1}' http://polygon-1-bor-rpc:8545
{"jsonrpc":"2.0","id":1,"result":[{"enode":"enode://b55c330d72773a9ae024b7d286a0f29032b40a3aa54298f14f2dbc80543728323b2bb10ce2b7ccca926fcb2cdbdbc655d5d54952d79b6b24f32d455c5c300206@148.251.142.58:30309","id":"f7eff654f3b4f94b35206d6ace9eb82dafe49a1994451193d28f28bd27a857e1","name":"bor/v1.3.2/linux-amd...","hash":"0x00e0ebe3540c6167655ac61d95ff5d62a49db93b29b52eb453b9c36d29feea4c","number":58111790,"td":1073132818},{"enode":"enode://bcd6d6edb933d481206e73a4a9d3eefcc3bb73afe177b157463078b8a6801592be8b730998a2aca68d23736b5f395b8d5e5705263baf347f97940a4bcca405e7@148.251.254.139:30319","id":"e1f693473a338271f8be0f56443578e60c832fe7735b7c0c9bfb8603c19e0e96","name":"bor/v1.3.2/linux-amd...","hash":"0x3631a9a5ad379ee464506d09de0acb0b27a043261f3f3637c0f88cfb7fc5c7a2","number":58111841,"td":1073134093},{"enode":"enode://cd5e65d94ebc59cb55f3aba40deef6c25df3bf2fe7a552648fd2011f18380262425901de47b5d1b96fccb165f6775aad1247d479738ea210db30c863f2560af7@178.63.132.44:30308","id":"c549ec4258fdee0e1f590e7fbe2f1969bd98da3aab188dfb7622f5a45a0185fb","name":"bor/v1.3.2/linux-amd...","hash":"0xbb624b4eefcad0e6c9fb81ce30ef01edc90ecd818e753999bd12f54744d17782","number":58111892,"td":1073135368}]}

Peers for the pod without issue:

/ $ curl -X POST -H "Content-Type: application/json" --data '{"jsonrpc":"2.0","method":"debug_peerStats","params":[],"id":1}' http://polygon-2-bor-rpc:8545
{"jsonrpc":"2.0","id":1,"result":[{"enode":"enode://caebfa8cdb2d80c9e9699c3067781b61c5e12c0c9e670cbfc36a61e60c3980e3bb86a693a979f76a5e7fc336edeab9d0dcede18e3e8465ad689c9fbbf42c1a91@89.187.163.132:30303?discport=30331","id":"563b8afb124f5956a12a12899373c34bc6dcadd80d56a8ac15f2b54523fed753","name":"bor/v1.2.7/linux-amd...","hash":"0x037b8ab4c8710ec31555a89dfbc0a5243f02274d86aa35266a605fe517345c90","number":58111809,"td":1073133293},{"enode":"enode://02911edbbc7a40168a4e13f0390e87224666c0912f875869df5d1e656f331512f34e58f0c27cb3e79be5f3bf975b169b387757163b1ac508343ece164a41f39c@35.171.120.130:30303?discport=30413","id":"55215b613e6cea6f5089d717a122afae0d76b71496bb10f35b826dcfd8b97c76","name":"bor/v1.2.7/linux-amd...","hash":"0x037b8ab4c8710ec31555a89dfbc0a5243f02274d86aa35266a605fe517345c90","number":58111809,"td":1073133293},{"enode":"enode://d429657f24c2eb7a919e7b746ae0719f9532a7a5b9d1b520335f5d6ecebe7789dea1b47cdc6b1fe45d8d2bd493dfb333f4f3de920666c7d2e155f8567f6b3d43@65.21.165.49:30316","id":"032da85af6f11f3a37c2255ea28a6852ae32447b0fd3e2c27abbfb7ab3670773","name":"bor/v1.3.2/linux-amd...","hash":"0x3c248e480fac4b4b86fd227532fd0acec7a4568d0671663c1d2436a010a127a0","number":58111800,"td":1073133068},{"enode":"enode://6bd2864d167c5559191b4fa2e021d4530683807184657e4c67cc35763ca5803634a798c35dd6cb3d4e3bd975424f66d188c36ab37c8e68a1c4580ee34223b6b6@11.12.9.1:7874","id":"a267873ad5d8a0f80436af5c85d473188119cb889bf122ff856ab0c760a91156","name":"bor/v1.2.7/linux-amd...","hash":"0x43d11c2d3cbeac44afce5f738c84dc74e6a47e0bb33bd202b276ee04e130717a","number":58111991,"td":1073137843},{"enode":"enode://7128fed556655acbc26c228b908a40dcb05595e14f930e97432c086ebc504bf704113ba2891931613c5be973b891d23e2c5006e911f7edea883a06f0c05ce46c@135.181.236.61:30303?discport=30402","id":"5bd7727903ce88412e78043d85d39130061434b077eea8557e22d90a3c061617","name":"bor/v1.2.7/linux-amd...","hash":"0x037b8ab4c8710ec31555a89dfbc0a5243f02274d86aa35266a605fe517345c90","number":58111809,"td":1073133293},{"enode":"enode://be6d6cc9093ea9479d682637ce55b863ea242cbab57471f2811e727780f1241b73e5640187da4298b77a9c738c3addcafa9b216597d107a927f3c63bfbe4702b@54.38.76.225:30303?discport=30325","id":"51e7d844f97cf858c3308dea34c26dd7626425803b16ef0c82c66aa040ec13ac","name":"bor/v1.2.7/linux-amd...","hash":"0x037b8ab4c8710ec31555a89dfbc0a5243f02274d86aa35266a605fe517345c90","number":58111809,"td":1073133293},{"enode":"enode://11ebcc0e0bb7a78629317d04ff8cbe68b01efe6e0e0877a2a9266e4373d20999476eaa1ca8763049cddc65bd372b2927074d546bfbe11ea6a7b9ae6d4a426899@148.251.194.252:30303?discport=30766","id":"66f32b1188ddb47d624ac928bbcc6c06502f8f93b27f7cd422a488591f92ed73","name":"bor/v1.2.7/linux-amd...","hash":"0x037b8ab4c8710ec31555a89dfbc0a5243f02274d86aa35266a605fe517345c90","number":58111809,"td":1073133293},{"enode":"enode://eeea5df536e885517fd6460297a7dc7b892ff3ad127d4f96449673388a89976e64411104a6ac10307969d6e3f41b548e651bb230705e79445c8139478e283693@79.127.226.30:30303?discport=30352","id":"9d2024b6a1de85851a1ae92528aa981f6cbd18ed3d7dc83edca6c668baa5ed1b","name":"bor/v1.2.7/linux-amd...","hash":"0x037b8ab4c8710ec31555a89dfbc0a5243f02274d86aa35266a605fe517345c90","number":58111809,"td":1073133293},{"enode":"enode://be28b2a08e777f9066c85a207702cff04a875483ad16403cef40e33262d757948a3d9d74200e4d9ce76982baf5303746ad9761276543076019968a457f476c73@141.95.35.53:30303?discport=30365","id":"57900e95a5c904c55ae381d43abaadb5f4eb484c2afb03c2f531c8af6f152750","name":"bor/v1.2.7/linux-amd...","hash":"0x037b8ab4c8710ec31555a89dfbc0a5243f02274d86aa35266a605fe517345c90","number":58111809,"td":1073133293},{"enode":"enode://aa4c00c22db9a0024da22e3992e83d9e4bafe9414244f0f2bd9300ebfe99c66d7d2d4285a239e0f46a2daefca19e1a7b71d5741c72720a822642af1dc90f9207@87.249.137.89:30303?discport=30472","id":"57d5e80262fac6d24b16ed32a6506b26653f22a768bea5faf8a1a05f3894267e","name":"bor/v1.2.7/linux-amd...","hash":"0x037b8ab4c8710ec31555a89dfbc0a5243f02274d86aa35266a605fe517345c90","number":58111809,"td":1073133293}]}

Both have the same configuration:

server                                                                                                                                                                                                    
      --identity=polygon-2                                                                                                                                                                                      
      --datadir=/bor-home                                                                                                                                                                                       
      --nat=extip:IP                                                                                                                                                                                
      --port=30875                                                                                                                                                                                              
      --bor.heimdall=http://polygon-heimdall-rpc:1317                                                                                                                                                           
      --http                                                                                                                                                                                                    
      --http.addr=0.0.0.0                                                                                                                                                                                       
      --http.vhosts=*                                                                                                                                                                                           
      --http.corsdomain=*                                                                                                                                                                                       
      --http.port=8545                                                                                                                                                                                          
      --ipcpath=/bor-home/bor/bor.ipc                                                                                                                                                                           
      --http.api=eth,net,web3,txpool,bor,admin,debug                                                                                                                                                            
      --ws                                                                                                                                                                                                      
      --ws.addr=0.0.0.0                                                                                                                                                                                         
      --ws.api=eth,net,web3,txpool,bor,admin,debug                                                                                                                                                              
      --ws.origins=*                                                                                                                                                                                            
      --graphql                                                                                                                                                                                                 
      --graphql.vhosts=*                                                                                                                                                                                        
      --syncmode=full                                                                                                                                                                                           
      --chain=mainnet                                                                                                                                                                                           
      --miner.gasprice=30000000000                                                                                                                                                                              
      --miner.gaslimit=30000000                                                                                                                                                                                 
      --txpool.nolocals                                                                                                                                                                                         
      --txpool.accountslots=16                                                                                                                                                                                  
      --txpool.globalslots=32768                                                                                                                                                                                
      --txpool.accountqueue=16                                                                                                                                                                                  
      --txpool.globalqueue=32768                                                                                                                                                                                
      --txpool.pricelimit=30000000000                                                                                                                                                                           
      --gpo.ignoreprice=30000000000                                                                                                                                                                             
      --txpool.lifetime=1h30m0s                                                                                                                                                                                 
      --maxpeers=200                                                                                                                                                                                            
      --metrics                                                                                                                                                                                                 
      --metrics.expensive                                                                                                                                                                                       
      --metrics.prometheus-addr=0.0.0.0:6060                                                                                                                                                                    
      --bootnodes=enode://b8f1cc9c5d4403703fbf377116469667d2b1823c0daf16b7250aa576bacf399e42c3930ccfcb02c5df6879565a2b8931335565f0e8d3f8e72385ecf4a4bf160a@3.36.224.80:30303,enode://8729e0c825f3d9cad382555f3e46dcff21af323e89025a0e6312df541f4a9e73abfa562d64906f5e59c51fe6f0501b3e61b07979606c56329c020ed739910759@54.194.245.5:30303,enode://0cb82b395094ee4a2915e9714894627de9ed8498fb881cec6db7c65e8b9a5bd7f2f25cc84e71e89d0947e51c76e85d0847de848c7782b13c0255247a6758178c@44.232.55.71:30303,enode://88116f4295f5a31538ae409e4d44ad40d22e44ee9342869e7d68bdec55b0f83c1530355ce8b41fbec0928a7d75a5745d528450d30aec92066ab6ba1ee351d710@159.203.9.164:30303,enode://a1d2af06659b080df1537490c04ef139f7cf71d3f1652011b722134b8f10361c69a445000809fadd6c1ad34f4a0ed58d72b5c1346d62ab536fae563f27fe2bba@142.132.136.31:30833,enode://ea3c4032b95d57b96dd482cf4fa986f491cf587244e81ebd6bf37eda116ccaf37233414529a6a86115e42b24b69a07d98036e4f991de6df48e88bc86e86f9069@142.132.136.31:30843,enode://fd10175c237537b11b359bbcd06d93a8595c0e77de05019bd2dfe22999d3aba1383cd99d1ebe81a0cc17111b911a3639869d68407b105e806017c395c4e45125@157.90.90.89:30803,enode://bb9fb6a0da0dcf52af4d89046ba257c8bdce40ff792f1eed55b363f72f9ff12fefe04180608e19ed9c2f5ee5f5c3385eb37bb76d3831bf23302cf522ebed6c92@168.119.70.250:30865,enode://667a3a764c33b7919b92fdb77db3a4736845d953b27c7384d15a60aeaa7b33b5d64ea4e17c38be62e4af52e82db43beffc9e8f2992085e673cb2cf2891c9964f@168.119.70.250:30875,enode://2e6fa77c5f66c0313a62177e0077bf1a3178adb41e4fa60352ba295e8aa9e26cf0074ba2d55f17cca8e5c7abfad766b6fc9e1eeb6586a762f43cfe63d3d6ddf7@67.235.115.91:30885,enode://e1b0767d1756a950f5fdab659d1292cabd303c5c92e8cf8865937d42ef61a0b5df4df88974db01ff21317bbeea88b9a3c299238e4e3ee6f42ed3fa3e730d9d79@65.108.228.152:30865
manav2401 commented 3 months ago

@insider89 the log unable to handle whitelist milestone err="missing blocks" is not the cause of sync issue but are the result of your node being not in sync. We've refactored this to avoid any confusion but that's for a later release.

Can you try doing this -- use the peers from the pod which is working well and use them as static-peers in the pod which is not working well?

Also, it'll be great if you can send some debug/trace logs along with the response of debug_peerStats/debug.peerStats() call.

insider89 commented 3 months ago

@manav2401 Now my second nodes stop syncing. I've added the peers from the sync nodes, but it didn't help. Could you please elaborate how to add trace/debug logs to my startup script?

manav2401 commented 3 months ago

@insider89 you'll have to add --verbosity=4 for enabling debug logs and --verbosity=5 for trace logs.

Moreover for the rpc call I mentioned, you can use this query below.

curl http://localhost:8545 -X POST --data '{"jsonrpc":"2.0","method":"debug_peerStats","params":[],"id":1}' -H "Content-Type: application/json"

insider89 commented 3 months ago

I see that I had big lag at nigh, but it was self fixed. Now the sync work. I'll monitor it and update here if ip's appear again.

Screenshot 2024-06-14 at 09 54 33
insider89 commented 3 months ago

Hey @manav2401 . Here is debug logs from node which failed to sync:

DEBUG[06-17|08:54:58.294] IP exceeds table limit                   ip=131.153.232.232
DEBUG[06-17|08:54:58.431] Rejected inbound connection              addr=11.12.2.0:62460       err="too many attempts"
DEBUG[06-17|08:54:58.798] IP exceeds bucket limit                  ip=131.153.239.93
DEBUG[06-17|08:54:59.091] IP exceeds table limit                   ip=131.153.232.227
DEBUG[06-17|08:54:59.091] IP exceeds table limit                   ip=148.251.142.61
DEBUG[06-17|08:54:59.091] IP exceeds table limit                   ip=131.153.232.235
DEBUG[06-17|08:54:59.094] IP exceeds bucket limit                  ip=131.153.239.91
DEBUG[06-17|08:54:59.094] IP exceeds bucket limit                  ip=131.153.239.93
DEBUG[06-17|08:54:59.425] Rejected inbound connection              addr=11.12.3.0:11999       err="too many attempts"
DEBUG[06-17|08:54:59.595] IP exceeds bucket limit                  ip=131.153.239.93
DEBUG[06-17|08:54:59.601] IP exceeds bucket limit                  ip=131.153.239.93
DEBUG[06-17|08:54:59.601] IP exceeds table limit                   ip=131.153.232.202
DEBUG[06-17|08:54:59.911] Rejected inbound connection              addr=11.12.2.0:39487       err="too many attempts"
DEBUG[06-17|08:54:59.920] Rejected inbound connection              addr=11.12.2.0:43005       err="too many attempts"
DEBUG[06-17|08:54:59.985] Rejected inbound connection              addr=11.12.3.0:64097       err="too many attempts"
DEBUG[06-17|08:54:59.993] Rejected inbound connection              addr=11.12.3.0:61075       err="too many attempts"
DEBUG[06-17|08:55:00.102] IP exceeds table limit                   ip=131.153.232.232
DEBUG[06-17|08:55:00.203] IP exceeds table limit                   ip=131.153.232.204

....

DEBUG[06-17|08:56:12.279] IP exceeds table limit                   ip=131.153.232.238
DEBUG[06-17|08:56:12.317] Fetching batch of headers                id=9ac3da926c332f6c conn=dyndial           count=192 fromnum=58,183,103 skip=0   reverse=false
DEBUG[06-17|08:56:12.474] Fetching batch of headers                id=9ac3da926c332f6c conn=dyndial           count=192 fromnum=58,183,295 skip=0   reverse=false
DEBUG[06-17|08:56:12.635] Fetching batch of headers                id=9ac3da926c332f6c conn=dyndial           count=192 fromnum=58,183,487 skip=0   reverse=false
DEBUG[06-17|08:56:12.682] IP exceeds table limit                   ip=131.153.232.206

....

DEBUG[06-17|08:56:08.183] Failed to deliver retrieved headers      peer=9ac3da92 err="delivery not accepted"
DEBUG[06-17|08:56:08.183] Fetching batch of headers                id=9ac3da926c332f6c conn=dyndial           count=192 fromnum=58,173,311 skip=0   reverse=false
DEBUG[06-17|08:56:08.184] Failed to deliver retrieved headers      peer=9ac3da92 err="delivery not accepted"
DEBUG[06-17|08:56:08.184] Fetching batch of headers                id=9ac3da926c332f6c conn=dyndial           count=192 fromnum=58,173,503 skip=0   reverse=false
DEBUG[06-17|08:56:08.185] Failed to deliver retrieved headers      peer=9ac3da92 err="delivery not accepted"
DEBUG[06-17|08:56:08.185] Fetching batch of headers                id=9ac3da926c332f6c conn=dyndial           count=192 fromnum=58,173,695 skip=0   reverse=false
DEBUG[06-17|08:56:08.188] Failed to deliver retrieved headers      peer=9ac3da92 err="delivery not accepted"
DEBUG[06-17|08:56:08.188] Fetching batch of headers                id=9ac3da926c332f6c conn=dyndial           count=192 fromnum=58,173,887 skip=0   reverse=false
DEBUG[06-17|08:56:08.192] Failed to deliver retrieved headers      peer=9ac3da92 err="delivery not accepted"
AusIV commented 3 months ago

I'm seeing this issue too. I had three servers running bor v1.2.7. I restarted one of them, and it has only found 16 peers, with only a couple of inbound peers.

Hoping v1.3.3 would resolve the issue, I stood up two new servers running v1.3.3, and they have also had very few peers. I did find that increasing maxpeers would increase the number of peers could establish, but not to anything near the maxpeers setting. (When maxpeers=50, I top out at 16 peers. When maxpeers=160 I seem to be topping out around 25 peers).

sammy007 commented 3 months ago

If there is such a recurring problems in connectivity, why not add DNS seeds just like in bitcoin to both bor and heimdall and take care of that list at least once a month?

EronTo commented 3 months ago

@avinashbo also 1.3.2 affected. And it looks like node stop syncing after the first failure to query heimdall. Reproduced when used self-hosted and public heimdall. For me it looks like

INFO [05-12|02:23:07.294] Got new milestone from heimdall          start=56,864,736 end=56,864,751 hash=0xe4edf6949e3b5175ff40285e1707da26823e22f53ae38a1159e0f44a1197a846
INFO [05-12|02:23:07.631] Got new checkpoint from heimdall         start=56,862,886 end=56,863,397 rootHash=0x8b7498882c197291ee77bd95a01d7e6249586c514f80c28877f0289996a12dd6
WARN [05-12|02:23:17.197] an error while trying fetching from Heimdall path=/milestone/lastNoAck                    attempt=1 error="Get \"https://heimdall-api.polygon.technology/milestone/lastNoAck\": cont
ext deadline exceeded"
INFO [05-12|02:23:17.197] Retrying again in 5 seconds to fetch data from Heimdall path=/milestone/lastNoAck                    attempt=1
ERROR[05-12|02:23:17.197] Failed to fetch latest no-ack milestone  err="context deadline exceeded"
WARN [05-12|02:23:17.197] unable to handle no-ack-milestone service err="failed to fetch latest milestone"
INFO [05-12|02:23:19.119] Got new milestone from heimdall          start=56,864,752 end=56,864,779 hash=0x48d31cf79630b831ccd5f0a8fabb3edda3bcc07c75e95480ae6596e4158d5224
WARN [05-12|02:23:19.119] unable to handle whitelist milestone     err="missing blocks"
INFO [05-12|02:23:31.323] Got new milestone from heimdall          start=56,864,752 end=56,864,779 hash=0x48d31cf79630b831ccd5f0a8fabb3edda3bcc07c75e95480ae6596e4158d5224
WARN [05-12|02:23:31.323] unable to handle whitelist milestone     err="missing blocks"
INFO [05-12|02:23:43.308] Got new milestone from heimdall          start=56,864,752 end=56,864,779 hash=0x48d31cf79630b831ccd5f0a8fabb3edda3bcc07c75e95480ae6596e4158d5224
WARN [05-12|02:23:43.308] unable to handle whitelist milestone     err="missing blocks"

and it would stay in such state until node rebooted. @manav2401 i tried

  1. upgrade to v1.3.2
  2. use https://heimdall-api.polygon.technology/ instead of our homebrew
  3. bump --maxpeers to 200

still no luck Br, Alex

Do you have any solution now, I have the same problem

sammy007 commented 3 months ago

I had tiny network outage lately and bor stuck with missing blocks even if networking got back to normal almost immediately. It was on 1.3.4-beta2, switched to 1.3.3 for now.

cshintov commented 2 months ago

Were you guys able to fix your nodes?

rushsinging commented 2 months ago

meet this too.

rushsinging commented 2 months ago

a admin.peerStats() when meet unable to handle whitelist milestone err="missing blocks"

[{
    enode: "enode://a0c47fd67da1a35b7b5602b28384e38fffc7d6b9468267f6d137bdd07421f38852555bbe1d8d1addb47b20298a01af36b5df5f59cd9940104af81d3826fd4a51@89.187.163.132:30303?discport=30340",
    hash: "0x7add8232d8709dfd55858974ea03ba19c10524a31a2dfc7d94db6c878bda3f03",
    id: "a20cfc97bc10c26aa0af24accf7a4e476532178904803c11eab9f9729c66a9c9",
    name: "bor/v1.2.7/linux-amd...",
    number: 59466696,
    td: 1104718063
}, {
    enode: "enode://574882f52624edc06d9e033bf65ac65cc278d56d5734813186059a3997bb29e3a328f30c61ca799955e634bf666b0a2863d059dc7dec810175a24deee86b2b1b@131.153.232.202:30314",
    hash: "0x58a42ac2b28c325e8889980aa371bd90ab0121e705b9bf1fcd75bfe1af7bee49",
    id: "2603977d54874db44377fea3a3090ab108638fc00eccd18ddeac0b3434037f44",
    name: "bor/v1.3.3/linux-amd...",
    number: 59466780,
    td: 1104719659
}, {
    enode: "enode://63262746d304a7c26f39bbf6ec920112ad0656f71a6c756a116256888d4f07dfcfcea7965ef77512520c8ff6233de440bf0c070a5e7ab384dd4ba1fcd018ea8f@35.169.31.203:30303?discport=30343",
    hash: "0x24192bd2ffea4f844127af9db047b6b1e706b887a42f558b6c36bbc6ccd42fa3",
    id: "39d857ca5cf43073e5a7457b956b5d4af1304d2aff469e28065d55fe8e2afe64",
    name: "bor/v1.2.7/linux-amd...",
    number: 59466693,
    td: 1104718006
}, {
    enode: "enode://5c2ad585a7eb22445eab19781db9e52a331480feede4291254379b06606ad7493f2cdce9165445cbad585b90c6000d236b1a49b530fd2d2c4ace0a654a6a8848@18.204.172.171:30309",
    hash: "0x11d32e41e3f24b563030bc2728fa7e9a173e0f583abe64a96159f081c190370a",
    id: "3704f220923586d85b52fc25f5e28a4239092421449775fa52e2709d92add4a0",
    name: "bor/v1.3.3/linux-amd...",
    number: 59466726,
    td: 1104718633
}, {
    enode: "enode://07bfffa85e22851f072887bf4c49ffd8595ad070153919c0d9066475309a8f63a703abaed41695b1846b6fd110701123192d987ab0bd0c9c46475ef1f9deaf51@141.95.35.53:30303?discport=30740",
    hash: "0x24192bd2ffea4f844127af9db047b6b1e706b887a42f558b6c36bbc6ccd42fa3",
    id: "3678b7f03aad0de96624385e72b47dc6cc9bc08451afc1c1af302386897a213d",
    name: "bor/v1.2.7/linux-amd...",
    number: 59466693,
    td: 1104718006
}, {
    enode: "enode://0f27a92ee098267cc9852b02edb6e2e1cf88cacfa895c32db1aee0d401b589fac32f288943fbb50d81a3d9a7265b3807be7e37e3348737d8b6ee6ad560535e09@87.249.137.89:30303?discport=30375",
    hash: "0x24192bd2ffea4f844127af9db047b6b1e706b887a42f558b6c36bbc6ccd42fa3",
    id: "70c78130a39902c3b5158adbaebc4ac8395358e32fb31ddca27fbf436767aee1",
    name: "bor/v1.2.7/linux-amd...",
    number: 59466693,
    td: 1104718006
}, {
    enode: "enode://ac102f8e027bec9dff9d6f0e90982cd718f361df9785d89cae86678ae15e40eaa6aeb0019c336124d8a78485a00bf294b446e9cc09f44b182636d602cf2714b2@131.153.238.195:30310",
    hash: "0xf237f6c97257cb0811410aba54d1737a6e77e6334a8f8742ba8579546053ec9e",
    id: "3bb9292e99ee4f0f9202226e4a950711be57e12b76374c8269ba22c51f9dd218",
    name: "bor/v1.3.3/linux-amd...",
    number: 59466777,
    td: 1104719602
}, {
    enode: "enode://485a6a2cf6016e5d595b0a04324aa9d6d6e3ed972ae2f6c844d5cc1faa428a312ae479c72b250e6dfa3960d86b8b4b01fb4a58b6c88f3c14e6abaf5bf0a80c54@44.214.60.61:30308",
    hash: "0x5557f873df4d253f00b9593ff362425f06fa28af16c1f35ef93ddf1ef358d334",
    id: "3a4645b97cfe5e50dae7fe9beb66af9038ebfcc20089ef60b4b9aa186b8b16a4",
    name: "bor/v1.3.3/linux-amd...",
    number: 59466778,
    td: 1104719621
}, {
    enode: "enode://5822cb264f7a785bf0bb7e884bcf3b845811e5401200afec7051ed8a3dafb10e418b48693f7e41ae8379241c10e73cbb8faf2236306e7309637ff3879ab835d8@131.153.239.93:30384",
    hash: "0x60228b3005bba8e79f160c27fd2035c715f99d5665b075fa2872eab6aa10d38d",
    id: "264cca90ed2eabe4f962c77b9d45edc28b1b6fd7b2bd739ce81b294198484264",
    name: "bor/v1.3.3/linux-amd...",
    number: 59466779,
    td: 1104719640
}, {
    enode: "enode://8e891d303d900c941f05ba75eabcb7e75109b539792bbaf51b00ff54cc3251b01f90daabcdea93773a7a8de02f21eb38a630585519dfb14c7f7108cc75510716@131.153.232.234:30303",
    hash: "0x11d32e41e3f24b563030bc2728fa7e9a173e0f583abe64a96159f081c190370a",
    id: "341feb199921bc048786745ff570318e7ebcfaf7aacf14ff029131057107bcdd",
    name: "bor/v1.3.3/linux-amd...",
    number: 59466726,
    td: 1104718633
}, {
    enode: "enode://fdc3755978ddbfc4a75609f9679a2adbe7dd491c70a225252657086ce19f9156ce993959a53ae1aafdc1d9c09cdeef3f763061655a385a4f5a59ea2d67c2a003@148.251.194.252:30303?discport=30536",
    hash: "0x24192bd2ffea4f844127af9db047b6b1e706b887a42f558b6c36bbc6ccd42fa3",
    id: "2741e385a4b67dc0a8848d846f1fe07f6992477024f9edf0aa3eede4e2dbfb60",
    name: "bor/v1.2.7/linux-amd...",
    number: 59466693,
    td: 1104718006
}, {
    enode: "enode://03b63c85c8752a35859e9625c50f56f51af5a49f7e8810074bb241cc8325b7873cd34c1fd34cddd2947000405d953abe85128c4209b4cecc45d84abe72aafef5@135.181.236.61:30303?discport=30333",
    hash: "0x24192bd2ffea4f844127af9db047b6b1e706b887a42f558b6c36bbc6ccd42fa3",
    id: "18659083d259e1e34fc969ff74f70f3147340840d58922021837e7a1b2187d02",
    name: "bor/v1.2.7/linux-amd...",
    number: 59466693,
    td: 1104718006
}, {
    enode: "enode://dcfb6d3d7fc9af4bb6d720bd30e898f9edb288dae8c0b7dc2c4c72ffbd30bd241e651b6ef0cce81d566aaa03ccc676add704e2bf572f4f1f1d1fac3c9947756e@65.108.1.189:30303?discport=30463",
    hash: "0xb449d9863971663a5b389ac971e0f2499c4e55bdf62ed680c2e4e41f4f1e4ee6",
    id: "fccd4c5f457ff718efdc15e4174776a799602d7a05c73226d7326c465f6a7aea",
    name: "bor/v1.2.7/linux-amd...",
    number: 59466715,
    td: 1104718424
}, {
    enode: "enode://881d69655f3835b2afaabf6002c8249abb8202791faff16d2288f745cccc0a2d5f1d5a33ef22f9395d63b75f22835dbb097b958df97e72e26326c7deccfff751@44.199.131.214:30314",
    hash: "0x5557f873df4d253f00b9593ff362425f06fa28af16c1f35ef93ddf1ef358d334",
    id: "3a06447c66fae7e299806029de7f223ea69a05e8d1585ab7ee73f3a6f159098c",
    name: "bor/v1.3.3/linux-amd...",
    number: 59466778,
    td: 1104719621
}, {
    enode: "enode://75e364c103a6cee77f80ccab1cbc6d5fcb74f2dfa38e109584857d70f7c0862b3a6ffaaaf0f1b48b05eda0d8a4c6fa1fcc6fcf224ce3c793aafd6f855679419a@54.38.76.225:30303?discport=30329",
    hash: "0x24192bd2ffea4f844127af9db047b6b1e706b887a42f558b6c36bbc6ccd42fa3",
    id: "0706a8ea93bd941ec48dd1d16ab14ccd2100256a2f6ce16ee0fb1d85c5ab339e",
    name: "bor/v1.2.7/linux-amd...",
    number: 59466693,
    td: 1104718006
}, {
    enode: "enode://bda1b01cb6c6de48e30e17cc415cbe855e1e1729c04718acf46887d83845cfcd9053db581588275f7fac8d8e649fabf755c8cb32562bbb7ffd0a84cba4ab32dc@79.127.216.33:30303?discport=30329",
    hash: "0x24192bd2ffea4f844127af9db047b6b1e706b887a42f558b6c36bbc6ccd42fa3",
    id: "78de8d63c5f59df6af211516f6da5b11571afcaba0b23c84486841ff26649ef3",
    name: "bor/v1.2.7/linux-amd...",
    number: 59466693,
    td: 1104718006
}, {
    enode: "enode://3a46b05649b4b73fc3bee6fdca95a8f41132d53ed9e02c4c3906c5055fbeb9538e658431cb73c8d5928693dca8d5445939b9fc7077d45a05dcc474dd3fcbd936@3.89.169.103:30307",
    hash: "0x60228b3005bba8e79f160c27fd2035c715f99d5665b075fa2872eab6aa10d38d",
    id: "267443751f7ada81c25d24b7e14256b232d329baf54b6964ecdaac296cc73311",
    name: "bor/v1.3.3/linux-amd...",
    number: 59466779,
    td: 1104719640
}]
rushsinging commented 2 months ago

Finally, I add two bootnodes, seems work well, still don't know why and when will break again.

"enode://a7f59b918fe19b5448b6dfbb35d5096eaffcdf20050f41f442c4a72377f7634da10fe23b1a405f72f6146b3c78738160c98c2c47d60a81e489fa47a50fe8700c@77.172.185.210:30303","enode://dcafedac347b24aea85a4b396511dc7ba4da37cd2ce13de9511aba50879fe04e6e259ce6314ae3376c1e3cc2f1909273781a74e9c386bcec29f5b42d1db0fe14@47.76.152.221:30303"
cshintov commented 2 months ago

I was able to fix it!

I deleted nodes and nodekey and restarted bor as stated here by @sammy007 .

Okay, I have deleted /bor/nodes and especially /bor/nodekey. The /bor was a carbon copy of another live node and it seems peers does not like it. Seems finally finished catching up and running fine. As to wrong reporting of eth_syncing, it sucks to see that. The only sense of using it is to check if node is in sync and if it knows that it's not in sync the response must not be false.

But it was not able to sync at all after that.

So I got the peers from other working node and added them as bootnodes. And then it started syncing without any issues.

curl -X POST -H "Content-Type: application/json" --data '{"jsonrpc":"2.0","method":"debug_peerStats","params":[],"id":1}' http://polygon-1-bor-rpc:8545
OldBorrow1488 commented 2 months ago

Bor version 1.3.3. by removing node and node key, adding bootnodes to the config: "enode://a7f59b918fe19b5448b6dfbb35d5096eaffcdf20050f41f442c4a72377f7634da10fe23b1a405f72f6146b3c78738160c98c2c47d60a81e489fa47a50fe8700c@77.172.185.210:30303" , "enode://dcafedac347b24aea85a4b396511dc7ba4da37cd2ce13de9511aba50879fe04e6e259ce6314ae3376c1e3cc2f1909273781a74e9c386bcec29f5b42d1db0fe14@47.76.152.221:30303"

Everything worked out, when he reached the high block, he did not stop and continued synchronization

malingzhao commented 2 months ago

so good , work great in v1.3.4