Open msirinotis opened 1 year ago
@msirinotis thanks for this report. I'll pick it up.
One thing to note - 'no block bodies to write in this log period' isn't an error as such - just a notification that there has been no download activity. The log runs every 20s to update of progress, if there is none, this message will be shown.
I have the same "issue" on one my server who i have sata SSD's, but my second server has M2 SSD's which the commit delay is excellent. I think this "issue" caused just on low IO performance disks. What disk do you have and which filesystem do you use (even any raid?) @msirinotis ?
from=15852703 to=15852996
[INFO] [10-29|12:42:01.427] Commit cycle in=17m34.292586926s
I have the same "issue" on one my server who i have sata SSD's, but my second server has M2 SSD's which the commit delay is excellent. I think this "issue" caused just on low IO performance disks. What disk do you have and which filesystem do you use (even any raid?) @msirinotis ?
from=15852703 to=15852996 [INFO] [10-29|12:42:01.427] Commit cycle in=17m34.292586926s
Thanks @iFA88.
This setup is a Samsung 870 QVO 4TB SSD SATA drive with ext4. It has ~560MB/s read/write IIRC.
Think that's too low? Had fast sync without issues for >6 months until recently.
@msirinotis Sadly i can not predict how much iops are needed, but you can check iotop
or iostat -x -d 1
whats happening.
@iFA88 Thanks, disk read & writes peak at 20 MB/s :(
Plenty of CPU room left too (i9-12900K). I'll test a few more things then might go for a full reinstall/resync or failing that might have no option but to switch clients but really don't want to!
Try restart with WRITE_MAP=true
env variable
System information
Erigon version: 2.28.1-dev-da354bc0, though the same issues for 2.28.1
OS & Version: Ubuntu 20.04
Commit hash : da354bc01a127d26e7c97737dc34806f7e327560
Expected behaviour
Erigon + Prysm syncs & stays in sync.
Actual behaviour
Erigon + Prysm falls behind in sync and has very long commit cycles and general instability.
Steps to reproduce the behaviour
Restart Erigon + Prysm from slightly behind head and watch it attempt to sync.
Backtrace
High level errors: Prysm = ERROR blockchain: received an undefined ee error error=timeout from http.Client on average a few times per minute when close to head. Erigon = Very long commit cycles / struggle to get to and keep up to head of the chain. A lot of
[NewPayload] stage loop is busy
and[ForkChoiceUpdated] stage loop
is busy in DEBUG but not sure if that is a problem or standard output.An example of a close to head cycle taking 14 seconds for 3 blocks below, have others like 128 blocks taking 6minutes etc. CPU/RAM/Disk all remain quite low throughout too.
Note in the example & in other commit cycles the things that stand out to me are;
[NewPayload] stage loop is busy
[ForkChoiceUpdated] stage loop is busy
[txpool] Commit
taking significant time (at least according to the log output)ERROR blockchain: received an undefined ee error error=timeout from http.Client
error a few times every minute.This is a 3 block attempt with stage loop near the end of a cycle slowing it down.
A second example of Erigon trying to sync back up ~300 blocks but getting itself into a stage loop is busy/prysm timeout loop midway (see timestamps)
Dumping CL logs for the same period just in case that helps:
Also I have seen this node go into a
No block bodies to write in this log period block
error after some time in this stage loop pattern - though a restart resolves that particular issue (not sure if related).Thanks