Closed ibhagwan closed 1 year ago
@fab-10 maybe we are too aggressive in triggering the new BWS format.
Thank you for the detailed report! Also drop that max peer count ;)
Thank you for the detailed report! Also drop that max peer count ;)
Ty!
It's currently set to max-peers = 100
, what's the recommended value?
On a post-Merge network, you really only need a handful of peers and having that many will eat up CPU to maintain the connections. I'd recommend somewhere between ~10-25, especially for an in-sync node.
On a post-Merge network, you really only need a handful of peers and having that many will eat up CPU to maintain the connections. I'd recommend somewhere between ~10-25, especially for an in-sync node.
That could be very helpful for my node indeed, ty!
can you provide the nimbus logs as well? I am trying to determine if nimbus is directing besu to resync or if this is something with our configuration of backwards sync.
can you provide the nimbus logs as well? I am trying to determine if nimbus is directing besu to resync or if this is something with our configuration of backwards sync.
Unfortuntely I don't have the mainnet logs anymore as I moved back to Nethermind due to this issue (I run my services inside a tmux session so the logs aren't written to disk).
However, this is easily replicable on my stronger machine (64GB RAM) running goleri validators, here even blocks over 1s seem to trigger the backward sync, below are a few examples of the Besu log with it's corresponding nimbus logs (nothing suspecious AFAIK), lmk if you need any other data and I'll be happy to provide it.
My goerli machine:
####################################################################################################
# #
# Besu 23.1.2 #
# #
# Configuration: #
# Network: Goerli #
# Network Id: 5 #
# Data storage: Bonsai #
# Sync mode: Snap #
# RPC HTTP APIs: ADMIN,ETH,NET,DEBUG,TXPOOL,WEB3 #
# RPC HTTP port: 8545 #
# Engine APIs: ENGINE,ETH #
# Engine port: 8551 #
# High spec configuration enabled #
# #
# Host: #
# Java: openjdk-java-17 #
# Maximum heap size: 8.00 GB #
# OS: linux-x86_64 #
# glibc: 2.36 #
# Total memory: 62.78 GB #
# CPU cores: 16 #
# #
####################################################################################################
(1): Besu, block 8,806,471 (other logs around these timestamps look totally normal)
2023-04-10 06:02:37.935-07:00 | vert.x-worker-thread-0 | INFO | AbstractEngineNewPayload | Imported #8,806,471 / 129 tx / 16 ws / base fee 181.73 gwei / 23,372,823 (77.9%) gas / (0x57323bb8ce92bdaf76a0e01ed67cd649d37a3df2513f0cc717a1aaec576e4277) in 1.072s. Peers: 25 2023-04-10 06:04:01.233-07:00 | vert.x-worker-thread-0 | INFO | BackwardSyncContext | Starting a new backward sync session 2023-04-10 06:04:01.564-07:00 | nioEventLoopGroup-3-7 | INFO | BackwardSyncStep | Backward sync phase 1 of 2 completed, downloaded a total of 200 headers. Peers: 25 2023-04-10 06:04:03.039-07:00 | nioEventLoopGroup-3-7 | INFO | BackwardSyncContext | Backward sync phase 2 of 2, 20.00% completed, imported 1 blocks of at least 5 (current head 8806472, target head 8806476). Peers: 25 2023-04-10 06:04:06.643-07:00 | ForkJoinPool.commonPool-worker-20 | INFO | BackwardSyncContext | Backward sync phase 2 of 2 completed, imported a total of 5 blocks. Peers: 25 2023-04-10 06:04:06.643-07:00 | ForkJoinPool.commonPool-worker-20 | INFO | BackwardSyncAlgorithm | Current backward sync session is done 2023-04-10 06:04:13.187-07:00 | vert.x-worker-thread-0 | INFO | AbstractEngineNewPayload | Imported #8,806,477 / 70 tx / 16 ws / base fee 247.67 gwei / 6,261,496 (20.9%) gas / (0x27591f26468e4a9c9998e9e04245de0fe896cf90fb43df0daf6882b3c86bd2b8) in 0.268s. Peers: 25
(1): Nimbus slot 5385313 (same block as Besu):
INF 2023-04-10 06:02:12.001-07:00 Slot start topics="beacnde" slot=5385311 epoch=168290 sync=synced peers=160 head=f6caa4d3:5385310 finalized=168288:f050bc47 delay=781us908ns INF 2023-04-10 06:02:14.194-07:00 State replayed topics="chaindag" blocks=0 slots=1 current=a25f47ab:5385280@5385312 ancestor=96a67889:5385311 target=96a67889:5385311@5385312 ancestorStateRoot=f85c844b targetStateRoot=136742bb found=true assignDur=33ms390us661ns replayDur=400ms355us121ns INF 2023-04-10 06:02:20.097-07:00 Slot end topics="beacnde" slot=5385311 nextActionWait=5m3s902ms144us508ns nextAttestationSlot=5385337 nextProposalSlot=-1 syncCommitteeDuties=none head=96a67889:5385311 NOT 2023-04-10 06:02:23.911-07:00 Attestation failed to match head topics="chaindag" epoch=168289 validator=<redacted> INF 2023-04-10 06:02:24.002-07:00 Slot start topics="beacnde" slot=5385312 epoch=168291 sync=synced peers=160 head=96a67889:5385311 finalized=168288:f050bc47 delay=2ms155us789ns INF 2023-04-10 06:02:33.871-07:00 State replayed topics="chaindag" blocks=0 slots=32 current=96a67889:5385311@5385312 ancestor=96a67889:5385311@5385312 target=96a67889:5385311@5385344 ancestorStateRoot=136742bb targetStateRoot=76229e0d found=true assignDur=2us863ns replayDur=1s596ms398us898ns INF 2023-04-10 06:02:34.382-07:00 Slot end topics="beacnde" slot=5385312 nextActionWait=4m49s617ms153us890ns nextAttestationSlot=5385337 nextProposalSlot=-1 syncCommitteeDuties=none head=96a67889:5385311 INF 2023-04-10 06:02:34.393-07:00 Missed multiple heartbeats topics="libp2p gossipsub" heartbeat=GossipSub delay=1s232ms439us436ns hinterval=700ms INF 2023-04-10 06:02:36.000-07:00 Slot start topics="beacnde" slot=5385313 epoch=168291 sync=synced peers=160 head=96a67889:5385311 finalized=168288:f050bc47 delay=128us10ns WRN 2023-04-10 06:02:37.809-07:00 Failed to exchange transition configuration topics="elmon" url=http://127.0.0.1:8551 err=Timeout INF 2023-04-10 06:02:44.114-07:00 Slot end topics="beacnde" slot=5385313 nextActionWait=4m39s885ms465us357ns nextAttestationSlot=5385337 nextProposalSlot=-1 syncCommitteeDuties=none head=30266594:5385313 INF 2023-04-10 06:02:48.000-07:00 Slot start topics="beacnde" slot=5385314 epoch=168291 sync=synced peers=160 head=30266594:5385313 finalized=168289:86cbefc2 delay=310us471ns INF 2023-04-10 06:02:56.101-07:00 Slot end topics="beacnde" slot=5385314 nextActionWait=4m27s898ms464us549ns nextAttestationSlot=5385337 nextProposalSlot=-1 syncCommitteeDuties=none head=4bd58568:5385314 INF 2023-04-10 06:03:00.001-07:00 Slot start topics="beacnde" slot=5385315 epoch=168291 sync=synced/opt peers=160 head=4bd58568:5385314 finalized=168289:86cbefc2 delay=1ms12us760ns INF 2023-04-10 06:03:00.028-07:00 Execution client not in sync; skipping validator duties for now topics="beacval" slot=5385315 headSlot=5385314 INF 2023-04-10 06:03:00.091-07:00 Slot end topics="beacnde" slot=5385315 nextActionWait=4m23s908ms42us557ns nextAttestationSlot=5385337 nextProposalSlot=-1 syncCommitteeDuties=none head=4bd58568:5385314
(2): Besu block 8,802,737
2023-04-09 14:02:14.341-07:00 | vert.x-worker-thread-0 | INFO | AbstractEngineForkchoiceUpdated | VALID for fork-choice-update: head: 0xa68c9fccae334588bce4db5229b8a04b5bc5d4c4ede1c52966e9bd83aa9e202b, finalized: 0xbf6fab88a694ec8ea73cb75cf8433c15605a913287b3bfd4a076cc8ffc1a0457, safeBlockHash: 0x626e9c11eafb9107dfe842af338263a3f21ba9982fb4f64c7b7824246f90f913 2023-04-09 14:02:27.672-07:00 | vert.x-worker-thread-0 | INFO | AbstractEngineNewPayload | Imported #8,802,737 / 285 tx / 16 ws / base fee 5.04 gwei / 29,973,156 (99.9%) gas / (0x7a56aed200372621f63c9b38d29702a898bef1c5c3c5da21dbfb32b4e7418bb7) in 1.385s. Peers: 25 2023-04-09 14:03:36.688-07:00 | vert.x-worker-thread-0 | INFO | BackwardSyncContext | Starting a new backward sync session 2023-04-09 14:03:36.988-07:00 | nioEventLoopGroup-3-1 | INFO | BackwardSyncStep | Backward sync phase 1 of 2 completed, downloaded a total of 200 headers. Peers: 25 2023-04-09 14:03:40.369-07:00 | nioEventLoopGroup-3-2 | INFO | BackwardSyncContext | Backward sync phase 2 of 2, 20.00% completed, imported 1 blocks of at least 5 (current head 8802738, target head 8802742). Peers: 25 2023-04-09 14:03:42.694-07:00 | nioEventLoopGroup-3-2 | INFO | BackwardSyncContext | Backward sync phase 2 of 2 completed, imported a total of 5 blocks. Peers: 25 2023-04-09 14:03:42.696-07:00 | ForkJoinPool.commonPool-worker-24 | INFO | BackwardSyncAlgorithm | Current backward sync session is done 2023-04-09 14:03:51.060-07:00 | vert.x-worker-thread-0 | INFO | AbstractEngineNewPayload | Imported #8,802,743 / 211 tx / 16 ws / base fee 5.11 gwei / 29,858,880 (99.5%) gas / (0x6e2f01a824d7db8cd946c419e6618af438df6b5fcb39c4702f7d9bcb9e5875b0) in 1.249s. Peers: 25 2023-04-09 14:04:26.437-07:00 | vert.x-worker-thread-0 | INFO | AbstractEngineNewPayload | Imported #8,802,744 / 188 tx / 16 ws / base fee 5.74 gwei / 27,804,062 (92.7%) gas / (0x4647c8b12117e9d92cc1fb87ccf5410450ca9387942c8e198ce7b50b77c739c9) in 1.204s. Peers: 25
(2): Nimbus slot 5380512
INF 2023-04-09 14:08:36.000-07:00 Slot start topics="beacnde" slot=5380543 epoch=168141 sync=synced peers=160 head=640ba7a9:5380542 finalized=168139:8e390e2a delay=140us448ns INF 2023-04-09 14:08:44.075-07:00 Slot end topics="beacnde" slot=5380543 nextActionWait=39s924ms968us963ns nextAttestationSlot=5380547 nextProposalSlot=-1 syncCommitteeDuties=none head=640ba7a9:5380542 INF 2023-04-09 14:08:48.001-07:00 Slot start topics="beacnde" slot=5380544 epoch=168142 sync=synced peers=160 head=640ba7a9:5380542 finalized=168139:8e390e2a delay=1ms26us161ns INF 2023-04-09 14:08:48.516-07:00 State replayed topics="chaindag" blocks=0 slots=2 current=294895ca:5380512@5380544 ancestor=640ba7a9:5380542 target=640ba7a9:5380542@5380544 ancestorStateRoot=8ed5c7bd targetStateRoot=053e6b01 found=true assignDur=33ms555us218ns replayDur=437ms709us426ns INF 2023-04-09 14:08:57.713-07:00 State replayed topics="chaindag" blocks=0 slots=32 current=640ba7a9:5380542@5380544 ancestor=640ba7a9:5380542@5380544 target=640ba7a9:5380542@5380576 ancestorStateRoot=053e6b01 targetStateRoot=90daed42 found=true assignDur=2us893ns replayDur=1s640ms752us963ns INF 2023-04-09 14:08:58.228-07:00 Slot end topics="beacnde" slot=5380544 nextActionWait=25s771ms79us297ns nextAttestationSlot=5380547 nextProposalSlot=-1 syncCommitteeDuties=none head=640ba7a9:5380542 INF 2023-04-09 14:08:58.240-07:00 Missed multiple heartbeats topics="libp2p gossipsub" heartbeat=GossipSub delay=1s277ms511us46ns hinterval=700ms INF 2023-04-09 14:09:00.007-07:00 Slot start topics="beacnde" slot=5380545 epoch=168142 sync=synced peers=160 head=640ba7a9:5380542 finalized=168139:8e390e2a delay=7ms408us720ns NOT 2023-04-09 14:09:03.482-07:00 Attestation failed to match head topics="chaindag" epoch=168140 validator=<redacted> INF 2023-04-09 14:09:03.524-07:00 State replayed topics="chaindag" blocks=0 slots=2 current=640ba7a9:5380542@5380543 ancestor=640ba7a9:5380542@5380543 target=640ba7a9:5380542@5380545 ancestorStateRoot=ac7356de targetStateRoot=2e016ed6 found=true assignDur=3us154ns replayDur=949ms8us648ns INF 2023-04-09 14:09:08.260-07:00 Slot end topics="beacnde" slot=5380545 nextActionWait=15s739ms310us276ns nextAttestationSlot=5380547 nextProposalSlot=-1 syncCommitteeDuties=none head=8054a3dd:5380545 INF 2023-04-09 14:09:12.001-07:00 Slot start topics="beacnde" slot=5380546 epoch=168142 sync=synced peers=160 head=8054a3dd:5380545 finalized=168140:ad2fed16 delay=991us809ns INF 2023-04-09 14:09:20.106-07:00 Slot end topics="beacnde" slot=5380546 nextActionWait=3s893ms243us122ns nextAttestationSlot=5380547 nextProposalSlot=-1 syncCommitteeDuties=none head=200a7e6b:5380546
(3): Besu block 8,798,617
2023-04-08 20:19:37.818-07:00 | vert.x-worker-thread-0 | INFO | AbstractEngineNewPayload | Imported #8,798,616 / 133 tx / 16 ws / base fee 100.20 gwei / 8,857,883 (29.5%) gas / (0x8b63a3a8c508a0e556e419e168cae94664180b2e9a3bc482d5918aa5f2a28012) in 0.417s. Peers: 25 2023-04-08 20:19:51.794-07:00 | vert.x-worker-thread-0 | INFO | AbstractEngineNewPayload | Imported #8,798,617 / 161 tx / 16 ws / base fee 95.07 gwei / 29,996,593 (100.0%) gas / (0x777fc03d53f438a318716557122ad7b927986f268f5eb3e6ef59ad29d037f07d) in 1.454s. Peers: 25 2023-04-08 20:19:53.364-07:00 | vert.x-worker-thread-0 | INFO | AbstractEngineForkchoiceUpdated | VALID for fork-choice-update: head: 0x777fc03d53f438a318716557122ad7b927986f268f5eb3e6ef59ad29d037f07d, finalized: 0xd202d7ce53811b02ee2f6323a10611fb1c63feb656db46692945a7b8ad0311c6, safeBlockHash: 0x5938a234202a46bc467ac6c7eb878d1270ea33785caf94382ea3352a7f65e108 2023-04-08 20:21:01.603-07:00 | vert.x-worker-thread-0 | INFO | BackwardSyncContext | Starting a new backward sync session 2023-04-08 20:21:02.238-07:00 | nioEventLoopGroup-3-8 | INFO | BackwardSyncStep | Backward sync phase 1 of 2 completed, downloaded a total of 192 headers. Peers: 25 2023-04-08 20:21:03.432-07:00 | nioEventLoopGroup-3-8 | INFO | BackwardSyncContext | Backward sync phase 2 of 2, 20.00% completed, imported 1 blocks of at least 5 (current head 8798618, target head 8798622). Peers: 25 2023-04-08 20:21:06.826-07:00 | nioEventLoopGroup-3-8 | INFO | BackwardSyncContext | Backward sync phase 2 of 2 completed, imported a total of 5 blocks. Peers: 25 2023-04-08 20:21:06.829-07:00 | ForkJoinPool.commonPool-worker-5 | INFO | BackwardSyncAlgorithm | Current backward sync session is done 2023-04-08 20:21:38.597-07:00 | vert.x-worker-thread-0 | INFO | AbstractEngineNewPayload | Imported #8,798,623 / 247 tx / 16 ws / base fee 103.18 gwei / 29,998,288 (100.0%) gas / (0xa1ec4480243faaa12f8f30858aa81d6121fb37747020e3d343e5b5c73dd9769f) in 1.410s. Peers: 25 2023-04-08 20:21:50.176-07:00 | vert.x-worker-thread-0 | INFO | AbstractEngineNewPayload | Imported #8,798,624 / 108 tx / 16 ws / base fee 116.08 gwei / 16,688,362 (55.6%) gas / (0x47dab1c059e324caaae6bf17b355d12a72632e94b17fb89206e2e52771026795) in 0.628s. Peers: 25 2023-04-08 20:22:02.319-07:00 | vert.x-worker-thread-0 | INFO | AbstractEngineNewPayload | Imported #8,798,625 / 78 tx / 16 ws / base fee 117.71 gwei / 11,499,324 (38.3%) gas / (0xfcc3bd7f4d549bc62190aee3f75ad835f671215ea7ad2cec90bcc551ed248036) in 0.482s. Peers: 25
(3): Nimbus slot 5375199
INF 2023-04-08 20:19:36.000-07:00 Slot start topics="beacnde" slot=5375198 epoch=167974 sync=synced peers=159 head=f947b406:5375197 finalized=167972:5ee2a2bd delay=283us502ns INF 2023-04-08 20:19:44.106-07:00 Slot end topics="beacnde" slot=5375198 nextActionWait=27s893ms671us768ns nextAttestationSlot=5375201 nextProposalSlot=-1 syncCommitteeDuties=none head=5da2ff83:5375198 INF 2023-04-08 20:19:48.000-07:00 Slot start topics="beacnde" slot=5375199 epoch=167974 sync=synced peers=158 head=5da2ff83:5375198 finalized=167972:5ee2a2bd delay=894us991ns WRN 2023-04-08 20:19:51.405-07:00 Failed to exchange transition configuration topics="elmon" url=http://127.0.0.1:8551 err=Timeout INF 2023-04-08 20:19:52.655-07:00 State replayed topics="chaindag" blocks=0 slots=1 current=0bb13188:5375168@5375200 ancestor=7397aa62:5375199 target=7397aa62:5375199@5375200 ancestorStateRoot=2aa3ecf9 targetStateRoot=8bc483fd found=true assignDur=33ms528us672ns replayDur=407ms898us623ns INF 2023-04-08 20:19:56.102-07:00 Slot end topics="beacnde" slot=5375199 nextActionWait=15s897ms66us236ns nextAttestationSlot=5375201 nextProposalSlot=-1 syncCommitteeDuties=none head=7397aa62:5375199 INF 2023-04-08 20:20:00.000-07:00 Slot start topics="beacnde" slot=5375200 epoch=167975 sync=synced peers=159 head=7397aa62:5375199 finalized=167972:5ee2a2bd delay=267us296ns INF 2023-04-08 20:20:08.322-07:00 Slot end topics="beacnde" slot=5375200 nextActionWait=3s677ms92us22ns nextAttestationSlot=5375201 nextProposalSlot=-1 syncCommitteeDuties=none head=abf39360:5375200 INF 2023-04-08 20:20:12.000-07:00 Slot start topics="beacnde" slot=5375201 epoch=167975 sync=synced/opt peers=158 head=abf39360:5375200 finalized=167973:e2ce79bc delay=188us16ns INF 2023-04-08 20:20:12.027-07:00 Execution client not in sync; skipping validator duties for now topics="beacval" slot=5375201 headSlot=5375200 INF 2023-04-08 20:20:12.091-07:00 Slot end topics="beacnde" slot=5375201 nextActionWait=2m59s908ms810us448ns nextAttestationSlot=5375216 nextProposalSlot=-1 syncCommitteeDuties=none head=abf39360:5375200 INF 2023-04-08 20:20:24.001-07:00 Slot start topics="beacnde" slot=5375202 epoch=167975 sync=synced/opt peers=159 head=abf39360:5375200 finalized=167973:e2ce79bc delay=1ms23us910ns INF 2023-04-08 20:20:24.028-07:00 Execution client not in sync; skipping validator duties for now topics="beacval" slot=5375202 headSlot=5375200 INF 2023-04-08 20:20:24.091-07:00 Slot end topics="beacnde" slot=5375202 nextActionWait=2m47s908ms946us775ns nextAttestationSlot=5375216 nextProposalSlot=-1 syncCommitteeDuties=none head=abf39360:5375200
Thanks for this report - we have noticed this on our own nodes on sepolia and will work to debug (or blame on nimbus ;0)
@non-fungible-nelson, siince I upgraded my goerli node to v23.4.0
(just about when it came out) I didn't have a single "Starting a new backward sync session" message aside from the initial sync when starting the service - was anything changed in the new version that should've affected this issue?
As of now, this issue seems to be solved.
this is a problem with the way sync is handled between besu and certain CLs - I am going to keep this open so we can track against our new post-Merge sync strategy.
this is a problem with the way sync is handled between besu and certain CLs - I am going to keep this open so we can track against our new post-Merge sync strategy.
Another data point for you @non-fungible-nelson, after seeing the new version works great on my Goerli node I migrated back my mainnet node from nethermind to besu and I'm experiencing the same great performance, hope I'm not jinxing it but it's been over 2 days so far and not a single "new backward sync" message :-)
From my end, this issue is definitely solved, attestation eff is also hovering between 97-99% which is great!
Description
When a larger than usual block is imported (with > 90% gas utilization), the block import time will be unusually long (unsure why, as this doesn't happen with all > 90% gas blocks), when this happens Besu will trigger a new backward sync session causing besu to be out of sync for a short period of time (6-8) blocks resulting in missed attestations.
Expected behavior: Since blocks are roughly every ~12s I'm not sure why Besu needs a new backward sync session after being delayed for what seems like 3-5s tops.
Few examples (logs are otherwise clean of warnings/errors):
Important to note that block import times are usually < 1-2s even when blocks are >90% utlization, so I'm not sure why some blocks are so expensive, for example:
Actual behavior: Besu triggers a backward sync session resulting in being out-of-sync for 6-8 blocks.
Frequency: 3-4 times roughly every 24h.
Versions (Add all that apply)
besu --version
]java -version
]uname -a
]eth2 specification v1.3.0-rc.5
Nim Compiler Version 1.6.12 [Linux: amd64]
####################################################################################################
Besu 23.1.2
Configuration:
Network: Mainnet
Network Id: 1
Data storage: Bonsai
Sync mode: Checkpoint
RPC HTTP APIs: ADMIN,ETH,NET,DEBUG,TXPOOL,WEB3
RPC HTTP port: 8545
Engine APIs: ENGINE,ETH
Engine port: 8551
High spec configuration enabled
Host:
Java: openjdk-java-17
Maximum heap size: 5.00 GB
OS: linux-x86_64
glibc: 2.36
Total memory: 15.63 GB
CPU cores: 4
####################################################################################################