ledgerwatch / erigon

Ethereum implementation on the efficiency frontier
GNU Lesser General Public License v3.0
3.03k stars 1.06k forks source link

Very long cycle times (stage loop is busy/Prysm timeouts) #5821

Open msirinotis opened 1 year ago

msirinotis commented 1 year ago

System information

Erigon version: 2.28.1-dev-da354bc0, though the same issues for 2.28.1

OS & Version: Ubuntu 20.04

Commit hash : da354bc01a127d26e7c97737dc34806f7e327560

Expected behaviour

Erigon + Prysm syncs & stays in sync.

Actual behaviour

Erigon + Prysm falls behind in sync and has very long commit cycles and general instability.

Steps to reproduce the behaviour

Restart Erigon + Prysm from slightly behind head and watch it attempt to sync.

Backtrace

High level errors: Prysm = ERROR blockchain: received an undefined ee error error=timeout from http.Client on average a few times per minute when close to head. Erigon = Very long commit cycles / struggle to get to and keep up to head of the chain. A lot of [NewPayload] stage loop is busy and [ForkChoiceUpdated] stage loop is busy in DEBUG but not sure if that is a problem or standard output.

An example of a close to head cycle taking 14 seconds for 3 blocks below, have others like 128 blocks taking 6minutes etc. CPU/RAM/Disk all remain quite low throughout too.

Note in the example & in other commit cycles the things that stand out to me are;

This is a 3 block attempt with stage loop near the end of a cycle slowing it down.

DBUG[10-21|20:39:02.195] Received ForkchoiceUpdated               head=0x3309a063dc0600b37cc4801513375225eb050cf11c2f64df0922796f3fd89a68 safe=0x3309a063dc0600b37cc4801513375225eb050cf11c2f64df0922796f3fd89a68 finalized=0x5539746ce9202fc1532fd2eb771ce410144b7eda848037daf390dd664c5c0cce build=false
DBUG[10-21|20:39:02.299] [1/16 Snapshots] DONE                    in=17.64µs
INFO[10-21|20:39:02.299] [2/16 Headers] Waiting for Consensus Layer... 
DBUG[10-21|20:39:02.303] [ForkChoiceUpdated] sending forkChoiceMessage head=0x3309a063dc0600b37cc4801513375225eb050cf11c2f64df0922796f3fd89a68
DBUG[10-21|20:39:02.303] [2/16 Headers] Handling fork choice      headerHash=0x3309a063dc0600b37cc4801513375225eb050cf11c2f64df0922796f3fd89a68
INFO[10-21|20:39:02.303] [2/16 Headers] Fork choice: chain extension from=15795636 to=15795639
DBUG[10-21|20:39:02.303] [2/16 Headers] DONE                      in=4.43339ms
DBUG[10-21|20:39:02.303] [3/16 CumulativeIndex] DONE              in=138.613µs
DBUG[10-21|20:39:02.304] [4/16 BlockHashes] DONE                  in=132.423µs
INFO[10-21|20:39:03.091] [5/16 Bodies] Writing block bodies       block_num=15795639 remaining=0 alloc=806.4MB sys=8.1GB
DBUG[10-21|20:39:03.091] [5/16 Bodies] DONE                       in=787.344872ms
DBUG[10-21|20:39:03.105] [6/16 Senders] DONE                      in=14.012734ms
INFO[10-21|20:39:04.257] [7/16 Execution] Completed on            block=15795639
DBUG[10-21|20:39:04.257] [7/16 Execution] DONE                    in=1.152353518s
DBUG[10-21|20:39:04.381] [8/16 HashState] DONE                    in=123.804914ms
DBUG[10-21|20:39:04.852] [9/16 IntermediateHashes] DONE           in=471.081836ms
INFO[10-21|20:39:04.939] [p2p] GoodPeers                          eth66=33
DBUG[10-21|20:39:05.143] [10/16 CallTraces] DONE                  in=290.471798ms
DBUG[10-21|20:39:05.237] [11/16 AccountHistoryIndex] DONE         in=93.961452ms
DBUG[10-21|20:39:05.318] [12/16 StorageHistoryIndex] DONE         in=80.693406ms
INFO[10-21|20:39:05.330] [txpool] stat                            block=15795636 pending=1186 baseFee=30000 queued=29014 alloc=871.3MB sys=8.1GB
DBUG[10-21|20:39:05.354] [13/16 LogIndex] DONE                    in=36.208358ms
DBUG[10-21|20:39:05.665] [14/16 TxLookup] DONE                    in=310.726129ms
DBUG[10-21|20:39:05.665] [15/16 WatchTheBurn] DONE                in=5.672µs
DBUG[10-21|20:39:05.665] [16/16 Finish] DONE                      in=15.179µs
DBUG[10-21|20:39:12.308] [txpool] Commit                          written_kb=11260 in=6.977710254s
DBUG[10-21|20:39:12.857] Received NewPayload                      height=15795640 hash=0x56a6e30611597429f4267d07cdf164de766506ae28c395ea90a18e6fde965b40
DBUG[10-21|20:39:12.876] slow peer or too many requests, dropping its old requests name=Geth/v1.10.25-stable...
DBUG[10-21|20:39:12.979] [NewPayload] stage loop is busy 
DBUG[10-21|20:39:13.357] slow peer or too many requests, dropping its old requests name=Geth/v1.10.25-stable...
INFO[10-21|20:39:20.152] Commit cycle                             in=14.487573076s
INFO[10-21|20:39:20.152] Timings (slower than 50ms)               Bodies=787ms Execution=1.152s HashState=123ms IntermediateHashes=471ms CallTraces=290ms AccountHistoryIndex=93ms StorageHistoryIndex=80ms TxLookup=310ms

A second example of Erigon trying to sync back up ~300 blocks but getting itself into a stage loop is busy/prysm timeout loop midway (see timestamps)

DBUG[10-21|23:08:10.284] [1/16 Snapshots] DONE                    in=4.696µs
INFO[10-21|23:08:10.284] [2/16 Headers] Waiting for Consensus Layer... 
DBUG[10-21|23:08:11.212] Received ForkchoiceUpdated               head=0x218a864a5d303331ae5b05153e39cc19e5606eadfffac2821326bdc687a2f909 safe=0x218a864a5d303331ae5b05153e39cc19e5606eadfffac2821326bdc687a2f909 finalized=0x193dab6180eb72329aa70d366f9a304e5402e09dd7e66a95817dfa8693e26845 build=false
DBUG[10-21|23:08:11.212] [ForkChoiceUpdated] sending forkChoiceMessage head=0x218a864a5d303331ae5b05153e39cc19e5606eadfffac2821326bdc687a2f909
DBUG[10-21|23:08:11.212] [2/16 Headers] Handling fork choice      headerHash=0x218a864a5d303331ae5b05153e39cc19e5606eadfffac2821326bdc687a2f909
INFO[10-21|23:08:11.212] [2/16 Headers] Fork choice: chain extension from=15796105 to=15796373
DBUG[10-21|23:08:11.213] [2/16 Headers] DONE                      in=928.416418ms
DBUG[10-21|23:08:11.214] [3/16 CumulativeIndex] DONE              in=1.01014ms
DBUG[10-21|23:08:11.214] [4/16 BlockHashes] DONE                  in=739.331µs
INFO[10-21|23:08:11.214] [5/16 Bodies] Processing bodies...       from=15796105 to=15796373
INFO[10-21|23:08:12.155] [5/16 Bodies] Writing block bodies       block_num=15796140 remaining=233 alloc=146.6MB sys=248.2MB
INFO[10-21|23:08:12.601] [5/16 Bodies] Writing block bodies       block_num=15796170 remaining=203 alloc=159.5MB sys=256.2MB
INFO[10-21|23:08:13.351] [5/16 Bodies] Writing block bodies       block_num=15796202 remaining=171 alloc=182.8MB sys=268.4MB
INFO[10-21|23:08:14.574] [5/16 Bodies] Writing block bodies       block_num=15796230 remaining=143 alloc=147.8MB sys=282.7MB
INFO[10-21|23:08:15.090] [5/16 Bodies] Writing block bodies       block_num=15796323 remaining=50 alloc=145.0MB sys=282.7MB
INFO[10-21|23:08:15.430] [5/16 Bodies] Writing block bodies       block_num=15796353 remaining=20 alloc=155.1MB sys=282.7MB
INFO[10-21|23:08:16.672] [5/16 Bodies] Writing block bodies       block_num=15796373 remaining=0 alloc=159.0MB sys=282.7MB
INFO[10-21|23:08:16.672] [5/16 Bodies] Processed                  highest=15796373
DBUG[10-21|23:08:16.672] [5/16 Bodies] DONE                       in=5.457571209s
INFO[10-21|23:08:16.672] [6/16 Senders] Started                   from=15796105 to=15796373
DBUG[10-21|23:08:16.885] [6/16 Senders] DONE                      in=213.15326ms
INFO[10-21|23:08:16.885] [7/16 Execution] Blocks execution        from=15796105 to=15796373
DBUG[10-21|23:08:18.057] Received NewPayload                      height=15796374 hash=0xbaa674730b7760548ac4e6df7ececb40913ac7be03238a9067692ee061318716
DBUG[10-21|23:08:18.180] [NewPayload] stage loop is busy 
DBUG[10-21|23:08:20.726] Received ForkchoiceUpdated               head=0xbaa674730b7760548ac4e6df7ececb40913ac7be03238a9067692ee061318716 safe=0xbaa674730b7760548ac4e6df7ececb40913ac7be03238a9067692ee061318716 finalized=0x193dab6180eb72329aa70d366f9a304e5402e09dd7e66a95817dfa8693e26845 build=false
DBUG[10-21|23:08:20.832] [ForkChoiceUpdated] stage loop is busy 
DBUG[10-21|23:08:27.761] Received NewPayload                      height=15796375 hash=0x68c910827f2cf29337a94d9dd3de93f6d307cd372f989ae716ae61bfd66abc40
DBUG[10-21|23:08:27.868] [NewPayload] stage loop is busy 
DBUG[10-21|23:08:27.981] Received ForkchoiceUpdated               head=0x68c910827f2cf29337a94d9dd3de93f6d307cd372f989ae716ae61bfd66abc40 safe=0x68c910827f2cf29337a94d9dd3de93f6d307cd372f989ae716ae61bfd66abc40 finalized=0x193dab6180eb72329aa70d366f9a304e5402e09dd7e66a95817dfa8693e26845 build=false
DBUG[10-21|23:08:28.088] [ForkChoiceUpdated] stage loop is busy 
DBUG[10-21|23:08:35.747] Received NewPayload                      height=15796376 hash=0xa31006978edc434149a407cbb952b01efe400fecd85ef6d71bb7a1ae81e9f9c8
DBUG[10-21|23:08:35.875] [NewPayload] stage loop is busy 
DBUG[10-21|23:08:36.159] Received ForkchoiceUpdated               head=0xa31006978edc434149a407cbb952b01efe400fecd85ef6d71bb7a1ae81e9f9c8 safe=0xa31006978edc434149a407cbb952b01efe400fecd85ef6d71bb7a1ae81e9f9c8 finalized=0x193dab6180eb72329aa70d366f9a304e5402e09dd7e66a95817dfa8693e26845 build=false
DBUG[10-21|23:08:36.266] [ForkChoiceUpdated] stage loop is busy 
INFO[10-21|23:08:37.426] [7/16 Execution] Executed blocks         number=15796220 blk/s=5.6 tx/s=998.3 Mgas/s=84.8 gasState=0.00 batch=2.6MB alloc=261.5MB sys=321.8MB
DBUG[10-21|23:08:42.273] Received NewPayload                      height=15796377 hash=0x2e55d830671da0824e29b16262862c40b5621d0bbf566c21e6bdf8cb6325317f
DBUG[10-21|23:08:42.382] [NewPayload] stage loop is busy 
DBUG[10-21|23:08:42.616] Received ForkchoiceUpdated               head=0x2e55d830671da0824e29b16262862c40b5621d0bbf566c21e6bdf8cb6325317f safe=0x2e55d830671da0824e29b16262862c40b5621d0bbf566c21e6bdf8cb6325317f finalized=0x193dab6180eb72329aa70d366f9a304e5402e09dd7e66a95817dfa8693e26845 build=false
DBUG[10-21|23:08:42.722] [ForkChoiceUpdated] stage loop is busy 
DBUG[10-21|23:08:47.005] Received ForkchoiceUpdated               head=0x2e55d830671da0824e29b16262862c40b5621d0bbf566c21e6bdf8cb6325317f safe=0x2e55d830671da0824e29b16262862c40b5621d0bbf566c21e6bdf8cb6325317f finalized=0x193dab6180eb72329aa70d366f9a304e5402e09dd7e66a95817dfa8693e26845 build=false
DBUG[10-21|23:08:47.111] [ForkChoiceUpdated] stage loop is busy 
DBUG[10-21|23:08:49.744] Received NewPayload                      height=15796378 hash=0xd7bb0424f1b40e3c8916da58e06d874e65f8cb0653e34f02cbba86d398c35aeb
DBUG[10-21|23:08:49.944] [NewPayload] stage loop is busy 
DBUG[10-21|23:08:51.478] Received ForkchoiceUpdated               head=0xd7bb0424f1b40e3c8916da58e06d874e65f8cb0653e34f02cbba86d398c35aeb safe=0xd7bb0424f1b40e3c8916da58e06d874e65f8cb0653e34f02cbba86d398c35aeb finalized=0x193dab6180eb72329aa70d366f9a304e5402e09dd7e66a95817dfa8693e26845 build=false
DBUG[10-21|23:08:51.586] [ForkChoiceUpdated] stage loop is busy 
INFO[10-21|23:08:57.232] [7/16 Execution] Executed blocks         number=15796281 blk/s=3.1 tx/s=559.8 Mgas/s=45.3 gasState=0.00 batch=4.1MB alloc=292.9MB sys=353.1MB
DBUG[10-21|23:08:58.093] Received NewPayload                      height=15796379 hash=0x4f7e00a77901c915c0524b9c06fdabd894a1600a98bafe4c1b0adf7c314370b9
DBUG[10-21|23:08:58.273] [NewPayload] stage loop is busy 
DBUG[10-21|23:08:58.405] Received ForkchoiceUpdated               head=0x4f7e00a77901c915c0524b9c06fdabd894a1600a98bafe4c1b0adf7c314370b9 safe=0x4f7e00a77901c915c0524b9c06fdabd894a1600a98bafe4c1b0adf7c314370b9 finalized=0x193dab6180eb72329aa70d366f9a304e5402e09dd7e66a95817dfa8693e26845 build=false
DBUG[10-21|23:08:58.511] [ForkChoiceUpdated] stage loop is busy 
DBUG[10-21|23:08:59.007] Received ForkchoiceUpdated               head=0x4f7e00a77901c915c0524b9c06fdabd894a1600a98bafe4c1b0adf7c314370b9 safe=0x4f7e00a77901c915c0524b9c06fdabd894a1600a98bafe4c1b0adf7c314370b9 finalized=0x193dab6180eb72329aa70d366f9a304e5402e09dd7e66a95817dfa8693e26845 build=false
DBUG[10-21|23:08:59.113] [ForkChoiceUpdated] stage loop is busy 
DBUG[10-21|23:09:04.879] Received NewPayload                      height=15796380 hash=0xab8f2367ec5317cab626b2894602d70723e899e0d724889228398eb4050b7f7c
DBUG[10-21|23:09:05.160] [NewPayload] stage loop is busy 
DBUG[10-21|23:09:05.682] Received ForkchoiceUpdated               head=0xab8f2367ec5317cab626b2894602d70723e899e0d724889228398eb4050b7f7c safe=0xab8f2367ec5317cab626b2894602d70723e899e0d724889228398eb4050b7f7c finalized=0x193dab6180eb72329aa70d366f9a304e5402e09dd7e66a95817dfa8693e26845 build=false
DBUG[10-21|23:09:05.787] [ForkChoiceUpdated] stage loop is busy 
DBUG[10-21|23:09:11.013] Received ForkchoiceUpdated               head=0xab8f2367ec5317cab626b2894602d70723e899e0d724889228398eb4050b7f7c safe=0xab8f2367ec5317cab626b2894602d70723e899e0d724889228398eb4050b7f7c finalized=0x193dab6180eb72329aa70d366f9a304e5402e09dd7e66a95817dfa8693e26845 build=false
DBUG[10-21|23:09:11.121] [ForkChoiceUpdated] stage loop is busy 
DBUG[10-21|23:09:12.040] Received NewPayload                      height=15796381 hash=0xe2fc272e1477b7a91202ba3f40e24bb16b6a624d84aba9de4af2d3439f22687a
DBUG[10-21|23:09:12.147] [NewPayload] stage loop is busy 
DBUG[10-21|23:09:15.454] Received ForkchoiceUpdated               head=0xe2fc272e1477b7a91202ba3f40e24bb16b6a624d84aba9de4af2d3439f22687a safe=0xe2fc272e1477b7a91202ba3f40e24bb16b6a624d84aba9de4af2d3439f22687a finalized=0x193dab6180eb72329aa70d366f9a304e5402e09dd7e66a95817dfa8693e26845 build=false
DBUG[10-21|23:09:15.560] [ForkChoiceUpdated] stage loop is busy 
INFO[10-21|23:09:20.570] [7/16 Execution] Executed blocks         number=15796357 blk/s=3.3 tx/s=585.7 Mgas/s=50.6 gasState=0.01 batch=6.1MB alloc=317.1MB sys=406.3MB
INFO[10-21|23:09:21.863] [p2p] GoodPeers                          eth66=4
DBUG[10-21|23:09:22.119] sendMessageById                          msgcode=3 err="set tcp 192.168.20.16:53792: use of closed network connection"
DBUG[10-21|23:09:22.119] Sentry peer did Connect                  peer=99da03c47d637b20eee24eec3c356c9a2e6148d6fe25ca195c7949ab8ec2c03e3556126b0d7ed644675e78c4318b08691b7b57de10e5f0d40d05b09238fa0a00
DBUG[10-21|23:09:22.119] Sentry peer did Disconnect               peer=99da03c47d637b20eee24eec3c356c9a2e6148d6fe25ca195c7949ab8ec2c03e3556126b0d7ed644675e78c4318b08691b7b57de10e5f0d40d05b09238fa0a00
DBUG[10-21|23:09:23.094] Received NewPayload                      height=15796382 hash=0xd0af3f9c44e3fd09e6f495d8c18a86c8584945ce8ddb47b8eb42e67d4f97b3c1
DBUG[10-21|23:09:23.192] Sentry peer did Connect                  peer=6d7cd20d6da4bb83a1d28cadb5d409b64edf314c0335df658c1a54e32c7c4a7ab7823d57c39b6a757556e68ff1df17c748b698544a55cb488b52479a92b60f00
DBUG[10-21|23:09:23.196] Sentry peer did Disconnect               peer=6d7cd20d6da4bb83a1d28cadb5d409b64edf314c0335df658c1a54e32c7c4a7ab7823d57c39b6a757556e68ff1df17c748b698544a55cb488b52479a92b60f00
DBUG[10-21|23:09:23.256] [NewPayload] stage loop is busy 
DBUG[10-21|23:09:23.756] Received ForkchoiceUpdated               head=0xd0af3f9c44e3fd09e6f495d8c18a86c8584945ce8ddb47b8eb42e67d4f97b3c1 safe=0xd0af3f9c44e3fd09e6f495d8c18a86c8584945ce8ddb47b8eb42e67d4f97b3c1 finalized=0x193dab6180eb72329aa70d366f9a304e5402e09dd7e66a95817dfa8693e26845 build=false
DBUG[10-21|23:09:23.862] [ForkChoiceUpdated] stage loop is busy 
DBUG[10-21|23:09:27.035] Sentry peer did Connect                  peer=41de28ba40406a32aaac6c990ee535577fb4028bec5a723602bb65c59dcfcf1daf3b94325887ded034bb8fe96ee2e82cc1eb6d350dc06da2ada5078f2245ff00
DBUG[10-21|23:09:27.035] Sentry peer did Disconnect               peer=41de28ba40406a32aaac6c990ee535577fb4028bec5a723602bb65c59dcfcf1daf3b94325887ded034bb8fe96ee2e82cc1eb6d350dc06da2ada5078f2245ff00
DBUG[10-21|23:09:30.235] Received NewPayload                      height=15796383 hash=0xa8b624137847667b56cbff5172d9ccd9b488074cf7e2020aca88ea79c7ebfe16
DBUG[10-21|23:09:30.464] [NewPayload] stage loop is busy 
DBUG[10-21|23:09:30.825] Received ForkchoiceUpdated               head=0xa8b624137847667b56cbff5172d9ccd9b488074cf7e2020aca88ea79c7ebfe16 safe=0xa8b624137847667b56cbff5172d9ccd9b488074cf7e2020aca88ea79c7ebfe16 finalized=0x193dab6180eb72329aa70d366f9a304e5402e09dd7e66a95817dfa8693e26845 build=false
DBUG[10-21|23:09:30.932] [ForkChoiceUpdated] stage loop is busy 
INFO[10-21|23:09:32.849] [7/16 Execution] Completed on            block=15796373
INFO[10-21|23:09:32.849] [7/16 Execution] DONE                    in=1m15.963868043s
INFO[10-21|23:09:32.849] [8/16 HashState] Promoting plain state   from=15796105 to=15796373
INFO[10-21|23:09:32.849] [8/16 HashState] Incremental promotion   from=15796105 to=15796373 codes=true csbucket=AccountChangeSet
INFO[10-21|23:09:33.578] [8/16 HashState] Incremental promotion   from=15796105 to=15796373 codes=false csbucket=AccountChangeSet
DBUG[10-21|23:09:37.011] Received NewPayload                      height=15796384 hash=0x373bd15859a3c97527cafaef13ff45e8e934d9aced26cda3a007d3562a174e68
DBUG[10-21|23:09:37.117] [NewPayload] stage loop is busy 
DBUG[10-21|23:09:37.743] Received ForkchoiceUpdated               head=0x373bd15859a3c97527cafaef13ff45e8e934d9aced26cda3a007d3562a174e68 safe=0x373bd15859a3c97527cafaef13ff45e8e934d9aced26cda3a007d3562a174e68 finalized=0x193dab6180eb72329aa70d366f9a304e5402e09dd7e66a95817dfa8693e26845 build=false
DBUG[10-21|23:09:37.850] [ForkChoiceUpdated] stage loop is busy 
DBUG[10-21|23:09:46.497] Received NewPayload                      height=15796385 hash=0x45d73e3997b709c98cf9d1dd457ac125e821a72a1b34cbca9a691d6628aa5def
DBUG[10-21|23:09:46.689] [NewPayload] stage loop is busy 
DBUG[10-21|23:09:48.181] Received ForkchoiceUpdated               head=0x45d73e3997b709c98cf9d1dd457ac125e821a72a1b34cbca9a691d6628aa5def safe=0x45d73e3997b709c98cf9d1dd457ac125e821a72a1b34cbca9a691d6628aa5def finalized=0x193dab6180eb72329aa70d366f9a304e5402e09dd7e66a95817dfa8693e26845 build=false
DBUG[10-21|23:09:48.288] [ForkChoiceUpdated] stage loop is busy 

Dumping CL logs for the same period just in case that helps:

[2022-10-21 23:08:09]  INFO blockchain: Called new payload with optimistic block payloadBlockHash=0x89b2ae853897 slot=4960829
[2022-10-21 23:08:09]  INFO blockchain: Called fork choice updated with optimistic block finalizedPayloadBlockHash=0x193dab6180eb headPayloadBlockHash=0x89b2ae853897 headSlot=4960829
[2022-10-21 23:08:10]  INFO blockchain: Synced new block block=0x48f8b90c... epoch=155025 finalizedEpoch=155024 finalizedRoot=0x68fbcb06... slot=4960829
[2022-10-21 23:08:10]  INFO blockchain: Finished applying state transition attestations=82 payloadHash=0x89b2ae853897 slot=4960829 syncBitsCount=509 txCount=257
[2022-10-21 23:08:10]  INFO blockchain: Called new payload with optimistic block payloadBlockHash=0x218a864a5d30 slot=4960830
[2022-10-21 23:08:10]  WARN powchain: Execution client is not syncing
[2022-10-21 23:08:17] ERROR blockchain: received an undefined ee error error=timeout from http.Client
[2022-10-21 23:08:18]  INFO blockchain: Synced new block block=0xaeefd568... epoch=155025 finalizedEpoch=155024 finalizedRoot=0x68fbcb06... slot=4960830
[2022-10-21 23:08:18]  INFO blockchain: Finished applying state transition attestations=128 payloadHash=0x218a864a5d30 slot=4960830 syncBitsCount=508 txCount=224
[2022-10-21 23:08:18]  INFO blockchain: Called new payload with optimistic block payloadBlockHash=0xbaa674730b77 slot=4960831
[2022-10-21 23:08:24]  WARN powchain: Execution client is not syncing
[2022-10-21 23:08:26] ERROR blockchain: received an undefined ee error error=timeout from http.Client
[2022-10-21 23:08:27]  INFO blockchain: Synced new block block=0xbc4e350b... epoch=155025 finalizedEpoch=155024 finalizedRoot=0x68fbcb06... slot=4960831
[2022-10-21 23:08:27]  INFO blockchain: Finished applying state transition attestations=128 payloadHash=0xbaa674730b77 slot=4960831 syncBitsCount=506 txCount=13
[2022-10-21 23:08:27]  INFO blockchain: Called new payload with optimistic block payloadBlockHash=0x68c910827f2c slot=4960832
[2022-10-21 23:08:33]  INFO p2p: Peer summary activePeers=45 inbound=0 outbound=44
[2022-10-21 23:08:33] ERROR blockchain: received an undefined ee error error=timeout from http.Client
[2022-10-21 23:08:35]  INFO blockchain: Synced new block block=0x07b3878f... epoch=155026 finalizedEpoch=155024 finalizedRoot=0x68fbcb06... slot=4960832
[2022-10-21 23:08:35]  INFO blockchain: Finished applying state transition attestations=128 payloadHash=0x68c910827f2c slot=4960832 syncBitsCount=509 txCount=161
[2022-10-21 23:08:35]  INFO blockchain: Called new payload with optimistic block payloadBlockHash=0xa31006978edc slot=4960833
[2022-10-21 23:08:38]  WARN powchain: Execution client is not syncing
[2022-10-21 23:08:42] ERROR blockchain: received an undefined ee error error=timeout from http.Client
[2022-10-21 23:08:42]  INFO blockchain: Synced new block block=0xc73c76f5... epoch=155026 finalizedEpoch=155024 finalizedRoot=0x68fbcb06... slot=4960833
[2022-10-21 23:08:42]  INFO blockchain: Finished applying state transition attestations=98 payloadHash=0xa31006978edc slot=4960833 syncBitsCount=491 txCount=211
[2022-10-21 23:08:42]  INFO blockchain: Called new payload with optimistic block payloadBlockHash=0x2e55d830671d slot=4960834
[2022-10-21 23:08:48] ERROR blockchain: received an undefined ee error error=timeout from http.Client
[2022-10-21 23:08:49]  INFO blockchain: Synced new block block=0xb3768acb... epoch=155026 finalizedEpoch=155024 finalizedRoot=0x68fbcb06... slot=4960834
[2022-10-21 23:08:49]  INFO blockchain: Finished applying state transition attestations=104 payloadHash=0x2e55d830671d slot=4960834 syncBitsCount=507 txCount=22
[2022-10-21 23:08:49]  INFO blockchain: Called new payload with optimistic block payloadBlockHash=0xd7bb0424f1b4 slot=4960835
[2022-10-21 23:08:52]  WARN powchain: Execution client is not syncing
[2022-10-21 23:08:53] ERROR blockchain: received an undefined ee error error=timeout from http.Client
[2022-10-21 23:08:57] ERROR blockchain: received an undefined ee error error=timeout from http.Client
[2022-10-21 23:08:58]  INFO blockchain: Synced new block block=0xc80a9fcc... epoch=155026 finalizedEpoch=155024 finalizedRoot=0x68fbcb06... slot=4960835
[2022-10-21 23:08:58]  INFO blockchain: Finished applying state transition attestations=128 payloadHash=0xd7bb0424f1b4 slot=4960835 syncBitsCount=507 txCount=206
[2022-10-21 23:08:58]  INFO blockchain: Called new payload with optimistic block payloadBlockHash=0x4f7e00a77901 slot=4960836
[2022-10-21 23:09:04] ERROR blockchain: received an undefined ee error error=timeout from http.Client
[2022-10-21 23:09:04]  INFO blockchain: Synced new block block=0x291b8b3a... epoch=155026 finalizedEpoch=155024 finalizedRoot=0x68fbcb06... slot=4960836
[2022-10-21 23:09:04]  INFO blockchain: Finished applying state transition attestations=128 payloadHash=0x4f7e00a77901 slot=4960836 syncBitsCount=508 txCount=229
[2022-10-21 23:09:05] ERROR blockchain: received an undefined ee error error=timeout from http.Client
[2022-10-21 23:09:05]  INFO blockchain: Called new payload with optimistic block payloadBlockHash=0xab8f2367ec53 slot=4960837
[2022-10-21 23:09:06]  WARN powchain: Execution client is not syncing
[2022-10-21 23:09:11] ERROR blockchain: received an undefined ee error error=timeout from http.Client
[2022-10-21 23:09:11]  INFO blockchain: Synced new block block=0xaa769276... epoch=155026 finalizedEpoch=155024 finalizedRoot=0x68fbcb06... slot=4960837
[2022-10-21 23:09:11]  INFO blockchain: Finished applying state transition attestations=128 payloadHash=0xab8f2367ec53 slot=4960837 syncBitsCount=508 txCount=113
[2022-10-21 23:09:12]  INFO blockchain: Called new payload with optimistic block payloadBlockHash=0xe2fc272e1477 slot=4960838
[2022-10-21 23:09:17] ERROR blockchain: received an undefined ee error error=timeout from http.Client
[2022-10-21 23:09:20]  WARN powchain: Execution client is not syncing
[2022-10-21 23:09:21] ERROR blockchain: received an undefined ee error error=timeout from http.Client
[2022-10-21 23:09:23]  INFO blockchain: Synced new block block=0x399b0a03... epoch=155026 finalizedEpoch=155024 finalizedRoot=0x68fbcb06... slot=4960838
[2022-10-21 23:09:23]  INFO blockchain: Finished applying state transition attestations=73 payloadHash=0xe2fc272e1477 slot=4960838 syncBitsCount=508 txCount=15
[2022-10-21 23:09:23]  INFO blockchain: Called new payload with optimistic block payloadBlockHash=0xd0af3f9c44e3 slot=4960839
[2022-10-21 23:09:29] ERROR blockchain: received an undefined ee error error=timeout from http.Client
[2022-10-21 23:09:30]  INFO blockchain: Synced new block block=0x2cc900ed... epoch=155026 finalizedEpoch=155024 finalizedRoot=0x68fbcb06... slot=4960839
[2022-10-21 23:09:30]  INFO blockchain: Finished applying state transition attestations=128 payloadHash=0xd0af3f9c44e3 slot=4960839 syncBitsCount=508 txCount=304
[2022-10-21 23:09:30]  INFO blockchain: Called new payload with optimistic block payloadBlockHash=0xa8b624137847 slot=4960840
[2022-10-21 23:09:33]  INFO p2p: Peer summary activePeers=43 inbound=0 outbound=43
[2022-10-21 23:09:34]  WARN powchain: Execution client is not syncing
[2022-10-21 23:09:36] ERROR blockchain: received an undefined ee error error=timeout from http.Client
[2022-10-21 23:09:36]  INFO blockchain: Synced new block block=0x88347abb... epoch=155026 finalizedEpoch=155024 finalizedRoot=0x68fbcb06... slot=4960840
[2022-10-21 23:09:36]  INFO blockchain: Finished applying state transition attestations=121 payloadHash=0xa8b624137847 slot=4960840 syncBitsCount=506 txCount=136
[2022-10-21 23:09:37]  INFO blockchain: Called new payload with optimistic block payloadBlockHash=0x373bd15859a3 slot=4960841
[2022-10-21 23:09:43] ERROR blockchain: received an undefined ee error error=timeout from http.Client
[2022-10-21 23:09:46]  INFO blockchain: Synced new block block=0xdb36c06d... epoch=155026 finalizedEpoch=155024 finalizedRoot=0x68fbcb06... slot=4960841
[2022-10-21 23:09:46]  INFO blockchain: Finished applying state transition attestations=128 payloadHash=0x373bd15859a3 slot=4960841 syncBitsCount=508 txCount=28
[2022-10-21 23:09:46]  INFO blockchain: Called new payload with optimistic block payloadBlockHash=0x45d73e3997b7 slot=4960842
[2022-10-21 23:09:48]  WARN powchain: Execution client is not syncing

Also I have seen this node go into a No block bodies to write in this log period block error after some time in this stage loop pattern - though a restart resolves that particular issue (not sure if related).

Thanks

revitteth commented 1 year ago

@msirinotis thanks for this report. I'll pick it up.

One thing to note - 'no block bodies to write in this log period' isn't an error as such - just a notification that there has been no download activity. The log runs every 20s to update of progress, if there is none, this message will be shown.

iFA88 commented 1 year ago

I have the same "issue" on one my server who i have sata SSD's, but my second server has M2 SSD's which the commit delay is excellent. I think this "issue" caused just on low IO performance disks. What disk do you have and which filesystem do you use (even any raid?) @msirinotis ?

from=15852703 to=15852996
[INFO] [10-29|12:42:01.427] Commit cycle                             in=17m34.292586926s
msirinotis commented 1 year ago

I have the same "issue" on one my server who i have sata SSD's, but my second server has M2 SSD's which the commit delay is excellent. I think this "issue" caused just on low IO performance disks. What disk do you have and which filesystem do you use (even any raid?) @msirinotis ?

from=15852703 to=15852996
[INFO] [10-29|12:42:01.427] Commit cycle                             in=17m34.292586926s

Thanks @iFA88.

This setup is a Samsung 870 QVO 4TB SSD SATA drive with ext4. It has ~560MB/s read/write IIRC.

Think that's too low? Had fast sync without issues for >6 months until recently.

iFA88 commented 1 year ago

@msirinotis Sadly i can not predict how much iops are needed, but you can check iotop or iostat -x -d 1 whats happening.

msirinotis commented 1 year ago

@iFA88 Thanks, disk read & writes peak at 20 MB/s :(

Plenty of CPU room left too (i9-12900K). I'll test a few more things then might go for a full reinstall/resync or failing that might have no option but to switch clients but really don't want to!

AskAlexSharov commented 10 months ago

Try restart with WRITE_MAP=true env variable