Closed tryphe closed 2 months ago
Syncing works fine until about 95%, or near block 2450000.
What exactly happens then? Does it stay stuck on the same block and stop syncing?
Does it start to ban other peers?
Syncing works fine until about 95%, or near block 2450000.
What exactly happens then? Does it start to ban other peers?
The errors and warnings stated above happen constantly (see gist for details) until I give the hangup signal around 03:16:52.049
. No blocks get added to the chain. It doesn't matter if I wait an hour, or 5 minutes, the log looks the same, so I gave a short log.
Indeed, peers are banned.
One thing I didn't mention, if I continually restart the daemon, it seems to sync a few chunks of 20 blocks upon start, but it only works for a minute or two until getting stuck like in the log.
It's obvious to me that there are some malicious nodes out there just handing me junk blocks. But could it really be an eclipse attack? It seems to me like adding more peers would reduce the chance quite a bit, but I'm not sure the problem here is just that. It seems like maybe a combination of a bug and malicious peers, but that's just speculation.
Which log level is the log you posted?
It's obvious to me that there are some malicious nodes out there just handing me junk blocks. But could it really be an eclipse attack?
You appear to have this issue for a month and no one else reported something similar so I don't think so, otherwise there would be more reports about similar issues.
It's probably either an issue with your setup or a bug that gets triggered in your setup, but looking at the logs I don't see anything obvious yet.
Which log level is the log you posted?
It's obvious to me that there are some malicious nodes out there just handing me junk blocks. But could it really be an eclipse attack?
You appear to have this issue for a month and no one else reported something similar so I don't think so, otherwise there would be more reports about similar issues.
It's probably either an issue with your setup or a bug that gets triggered in your setup, but looking at the logs I don't see anything obvious yet.
Log level 1. Let me know if you want a different log level.
Can you post log level 2?
Thanks for your quick feedback. I use the same VM image for several coin daemons, and only Monero has syncing problems, but it's the only cryptonote coin I run. I can't really imagine a more generic setup, but after searching around, you're definitely right that no one seems to have this problem.
I'll make a clean VM and see how that does.
I did a quick dist-upgrade to Debian 11 without rebuilding Monero and the problem seems to have resolved itself. Really strange. No problems now. Why it happens at 95% synced is beyond me. It doesn't happen at earlier blocks.
It seems that running Debian 10 and other non-new distros should be advised against. Debian 10 is "oldstable" which is still supported, so that's mildly concerning that people are probably using it still.
I'll post some new logs so they can be compared against the ones that failed.
Log level 2 after the problem was resolved by upgrading to Debian 11: https://gist.githubusercontent.com/tryphe/fac92713a0fab5aa6a75c6ae40905573/raw/c3262979e8e0f299b290bde46ac096c48cd6e144/gistfile1.txt
Thanks again for your feedback @selsta!
I can't spot anything in the log that's different. The warnings and errors look the same. It must be related to some buggy lower level library running on some Linux distributions.
Hopefully this might help someone that has a similar setup but this does not necessary qualify as a Monero issue. Maybe a check for some older library could be enforced, although frankly I have no idea which library it could be :stuck_out_tongue:
Does everything still work? It's unlikely that your OS version is related here.
@selsta yes. Ran dist-upgrade, did nothing else, and it's been synced ever since. I upgraded from Buster (Debian 10) which was up to date via apt, so there's a problem there.
It started doing it again today and now the node is behind. Will try and make a new machine. And again, none of my other coin daemons have this problem, so it's a monero problem. But it doesn't really make sense that i was able to sync ~100k blocks fine at a point where it was stuck, and then nothing.
19:02 <+moneromooo> The first few blocks that are needed are requested from a peer that doesn't send them.
19:02 <+moneromooo> The timeout seems to be 25 seconds here.
19:04 <+moneromooo> The blocks then get requested from another peer, but that peer gets dropped after it sends a reply to another message. Possible state machine bug here.
19:31 <+moneromooo> There are two instances of possible large delays in networking. So, who knows.
That's what moneromooo saw from the logs. Now it still doesn't explain why there aren't more reports about this. I guess testing a different machine would be helpful.
23:13 <hocuspocusoutoff> so... possibly not related at all since user's details dont match mine, but i had similar symptoms when i tried my first sync. i am hosting the blockchain on a NAS while daemon runs on my server. the NAS had some goofy-ass software that "indexed" things every time a change was made on disk. this basically caused the sync to grind to a halt somewhere around the same block height user is mentioning. once i was able to turn that
23:13 <hocuspocusoutoff> nonsense off, sync finished cleanly
Not sure if this is helpful for your setup but I thought I'd add it too.
Hi @selsta , I had a similar issue when trying to host the blockchain on my NAS. Turned out the problem was the NAS had some hidden service that "reindexed" its files every time ANY change was made on disk. This would lead monerod and this index service to compete for access to the file, which would grind the whole thing to a halt. Once I removed this NAS service, I was able to full sync from 0 in about 1.5 days total, with zero issues.
Could you possibly also have a similar service interfering with your blockchain file?
I was able to reproduce the bug with a new KVM instance with Debian 11 amd64 (vm software is up to date, host is up to date). Host is running other coin daemons with a similar setup. I tried to get a setup that is most similar to the one producing the bug.
Steps I took:
/etc/network/interfaces
apt remove network-manager network-manager-gnome light-locker xiccd pipewire colord system-config-printer gvfs gpg-agent tumbler at-spi2-core geoclue-2.0 policykit-1-gnome
apt autoremove
./monerod
and let it run. It will get stuck near block 2400000 to 2500000.$ gcc -v
:
gcc version 10.2.1 20210110 (Debian 10.2.1-6)
$ uname -a
:
Linux debian11 5.10.0-11-amd64 #1 SMP Debian 5.10.92-1 (2022-01-18) x86_64 GNU/Linux
$ ps -aux
(besides kernel processes) :
/sbin/init
/lib/systemd/systemd-journald
/lib/systemd/systemd-udevd
/usr/sbin/cron
/usr/bin/dbus-daemon
/usr/sbin/rsyslogd
/lib/systemd/systemd-logind
/usr/libexec/udisks2/udisksd
/usr/libexec/polkitd
/usr/sbin/lightdm
/usr/lib/xorg/Xorg
/sbin/agetty
/lib/systemd/systemd
(sd-pam)
xfce4-session
/usr/bin/dbus-daemon
/usr/bin/ssh-agent
/usr/lib/x86_64-linux-gnu/xfce4/xfconf/xfconfd
xfwm4
xfsettingsd
xfce4-panel
Thunar --daemon
xfdesktop
/usr/lib/x86_64-linux-gnu/xfce4/notifyd/xfce4-notifyd
/usr/lib/x86_64-linux-gnu/xfce4/panel/wrapper-2.0 /usr/lib/x86_64-linux-gnu/xfce4/panel/plugins/libsystray.so
/usr/lib/x86_64-linux-gnu/xfce4/panel/wrapper-2.0 /usr/lib/x86_64-linux-gnu/xfce4/panel/plugins/libnotification-plugin.so
/usr/lib/x86_64-linux-gnu/xfce4/panel/wrapper-2.0 /usr/lib/x86_64-linux-gnu/xfce4/panel/plugins/libactions.so
xterm
bash
./monerod
/usr/libexec/dconf-service
It got all the way up to and stuck at block ~2508000 for hours, with many peers and lots of network traffic. Restarted the daemon, and it's syncing okay again for a few minutes. Same issue as in OP. We'll see how long until it gets stuck again.
I'll try to see if I can reproduce without x running (sudo service lightdm stop
and switch to another tty)
Is the blockchain stored inside the VM or on an external network drive?
Is the blockchain stored inside the VM or on an external network drive?
Could you possibly also have a similar service interfering with your blockchain file?
The blockchain is always stored locally and no processes are running that seem like they could interfere. Most processes have a fraction of a second of total cpu time used.
The steps taken above were an effort to try and reproduce the bug on a different VM but on an identical setup. The problem still occurred.
I've taken some additional steps to reduce the complexity and was able to reproduce the bug without X or related processes running ie. xfcond and I was still able to reproduce the bug.
ss -t
gives me 13 outbound peers
nload
tells me there is around 0.5-1mbps burst and <0.5 average traffic, as opposed to >5mbps average while syncing blocks successfully.
ps aux
gives me (after I closed monerod):
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.2 98204 10216 ? Ss 17:30 0:04 /sbin/init
root 2 0.0 0.0 0 0 ? S 17:30 0:00 [kthreadd]
root 3 0.0 0.0 0 0 ? I< 17:30 0:00 [rcu_gp]
root 4 0.0 0.0 0 0 ? I< 17:30 0:00 [rcu_par_gp]
root 6 0.0 0.0 0 0 ? I< 17:30 0:00 [kworker/0:0H-events_highpri]
root 8 0.0 0.0 0 0 ? I< 17:30 0:00 [mm_percpu_wq]
root 9 0.0 0.0 0 0 ? S 17:30 0:00 [rcu_tasks_rude_]
root 10 0.0 0.0 0 0 ? S 17:30 0:00 [rcu_tasks_trace]
root 11 0.0 0.0 0 0 ? S 17:30 0:00 [ksoftirqd/0]
root 12 0.1 0.0 0 0 ? I 17:30 0:07 [rcu_sched]
root 13 0.0 0.0 0 0 ? S 17:30 0:00 [migration/0]
root 14 0.0 0.0 0 0 ? I 17:30 0:00 [kworker/0:1-mm_percpu_wq]
root 15 0.0 0.0 0 0 ? S 17:30 0:00 [cpuhp/0]
root 16 0.0 0.0 0 0 ? S 17:30 0:00 [cpuhp/1]
root 17 0.0 0.0 0 0 ? S 17:30 0:00 [migration/1]
root 18 0.0 0.0 0 0 ? S 17:30 0:00 [ksoftirqd/1]
root 20 0.0 0.0 0 0 ? I< 17:30 0:00 [kworker/1:0H-events_highpri]
root 21 0.0 0.0 0 0 ? S 17:30 0:00 [cpuhp/2]
root 22 0.0 0.0 0 0 ? S 17:30 0:00 [migration/2]
root 23 0.0 0.0 0 0 ? S 17:30 0:07 [ksoftirqd/2]
root 25 0.0 0.0 0 0 ? I< 17:30 0:00 [kworker/2:0H-events_highpri]
root 26 0.0 0.0 0 0 ? S 17:30 0:00 [cpuhp/3]
root 27 0.0 0.0 0 0 ? S 17:30 0:00 [migration/3]
root 28 0.0 0.0 0 0 ? S 17:30 0:00 [ksoftirqd/3]
root 30 0.0 0.0 0 0 ? I< 17:30 0:00 [kworker/3:0H-events_highpri]
root 34 0.0 0.0 0 0 ? S 17:30 0:00 [kdevtmpfs]
root 35 0.0 0.0 0 0 ? I< 17:30 0:00 [netns]
root 36 0.0 0.0 0 0 ? S 17:30 0:00 [kauditd]
root 37 0.0 0.0 0 0 ? S 17:30 0:00 [khungtaskd]
root 38 0.0 0.0 0 0 ? S 17:30 0:00 [oom_reaper]
root 39 0.0 0.0 0 0 ? I< 17:30 0:00 [writeback]
root 40 0.0 0.0 0 0 ? S 17:30 0:01 [kcompactd0]
root 41 0.0 0.0 0 0 ? SN 17:30 0:00 [ksmd]
root 42 0.0 0.0 0 0 ? SN 17:30 0:02 [khugepaged]
root 63 0.0 0.0 0 0 ? I< 17:30 0:00 [kintegrityd]
root 64 0.0 0.0 0 0 ? I< 17:30 0:00 [kblockd]
root 65 0.0 0.0 0 0 ? I< 17:30 0:00 [blkcg_punt_bio]
root 66 0.0 0.0 0 0 ? I< 17:30 0:00 [edac-poller]
root 67 0.0 0.0 0 0 ? I< 17:30 0:00 [devfreq_wq]
root 68 0.1 0.0 0 0 ? I< 17:30 0:15 [kworker/1:1H-kblockd]
root 69 0.0 0.0 0 0 ? S 17:30 0:01 [kswapd0]
root 70 0.0 0.0 0 0 ? I< 17:30 0:00 [kthrotld]
root 71 0.0 0.0 0 0 ? I< 17:30 0:00 [acpi_thermal_pm]
root 72 0.5 0.0 0 0 ? I< 17:30 0:40 [kworker/3:1H-kblockd]
root 73 0.0 0.0 0 0 ? I< 17:30 0:00 [ipv6_addrconf]
root 82 0.0 0.0 0 0 ? I< 17:30 0:00 [kstrp]
root 85 0.0 0.0 0 0 ? I< 17:30 0:00 [zswap-shrink]
root 86 0.0 0.0 0 0 ? I< 17:30 0:00 [kworker/u9:0]
root 135 0.0 0.0 0 0 ? I< 17:30 0:00 [ata_sff]
root 136 0.0 0.0 0 0 ? S 17:30 0:00 [scsi_eh_0]
root 137 0.0 0.0 0 0 ? I< 17:30 0:00 [scsi_tmf_0]
root 138 0.0 0.0 0 0 ? S 17:30 0:00 [scsi_eh_1]
root 139 0.0 0.0 0 0 ? I< 17:30 0:00 [scsi_tmf_1]
root 144 0.0 0.0 0 0 ? S 17:30 0:00 [scsi_eh_2]
root 145 0.0 0.0 0 0 ? I< 17:30 0:00 [scsi_tmf_2]
root 147 0.0 0.0 0 0 ? S 17:30 0:02 [irq/18-vmwgfx]
root 148 0.0 0.0 0 0 ? I< 17:30 0:00 [ttm_swap]
root 149 0.0 0.0 0 0 ? S 17:30 0:00 [card0-crtc0]
root 150 0.0 0.0 0 0 ? S 17:30 0:00 [card0-crtc1]
root 151 0.0 0.0 0 0 ? S 17:30 0:00 [card0-crtc2]
root 152 0.0 0.0 0 0 ? S 17:30 0:00 [card0-crtc3]
root 153 0.0 0.0 0 0 ? S 17:30 0:00 [card0-crtc4]
root 154 0.0 0.0 0 0 ? S 17:30 0:00 [card0-crtc5]
root 155 0.0 0.0 0 0 ? S 17:30 0:00 [card0-crtc6]
root 156 0.0 0.0 0 0 ? S 17:30 0:00 [card0-crtc7]
root 157 0.1 0.0 0 0 ? I< 17:30 0:11 [kworker/2:1H-kblockd]
root 160 0.0 0.0 0 0 ? I 17:30 0:00 [kworker/2:2-cgroup_destroy]
root 161 0.1 0.0 0 0 ? I< 17:30 0:11 [kworker/0:1H-kblockd]
root 190 0.0 0.0 0 0 ? S 17:30 0:00 [jbd2/sda1-8]
root 191 0.0 0.0 0 0 ? I< 17:30 0:00 [ext4-rsv-conver]
root 229 0.0 0.4 48332 17084 ? Ss 17:30 0:00 /lib/systemd/systemd-journald
root 251 0.0 0.1 21916 5040 ? Ss 17:30 0:00 /lib/systemd/systemd-udevd
root 294 0.0 0.0 0 0 ? I< 17:30 0:00 [iprt-VBoxWQueue]
root 310 0.0 0.0 0 0 ? I< 17:30 0:00 [cryptd]
root 414 0.0 0.0 6684 2648 ? Ss 17:30 0:00 /usr/sbin/cron -f
message+ 415 0.0 0.1 8208 4680 ? Ss 17:30 0:01 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
root 422 0.0 0.0 220740 3956 ? Ssl 17:30 0:00 /usr/sbin/rsyslogd -n -iNONE
root 424 0.0 0.1 21908 7144 ? Ss 17:30 0:00 /lib/systemd/systemd-logind
root 425 0.0 0.2 390204 10536 ? Ssl 17:30 0:00 /usr/libexec/udisks2/udisksd
root 438 0.1 0.0 0 0 ? I 17:30 0:13 [kworker/0:3-events]
root 515 0.0 0.1 234236 7672 ? Ssl 17:30 0:00 /usr/libexec/polkitd --no-debug
root 615 0.0 0.0 5784 1736 tty1 Ss+ 17:30 0:00 /sbin/agetty -o -p -- \u --noclear tty1 linux
root 930 0.0 0.0 6876 2856 tty5 Ss 17:33 0:00 /bin/login -p --
joe 936 0.0 0.2 15388 8792 ? Ss 17:33 0:00 /lib/systemd/systemd --user
joe 937 0.0 0.0 100920 2652 ? S 17:33 0:00 (sd-pam)
joe 943 0.0 0.1 8240 4848 tty5 S 17:33 0:00 -bash
root 1007 0.0 0.0 6864 3212 tty6 Ss 17:45 0:00 /bin/login -p --
root 1030 0.0 0.0 0 0 ? I 18:21 0:03 [kworker/u8:0-ext4-rsv-conversion]
joe 1038 0.0 0.1 8112 4664 tty6 S+ 18:28 0:00 -bash
root 1047 0.0 0.0 0 0 ? I 18:33 0:00 [kworker/2:0-cgroup_destroy]
root 1048 0.0 0.0 0 0 ? I 18:33 0:00 [kworker/3:2-events]
root 1078 0.0 0.0 0 0 ? I 19:21 0:00 [kworker/1:1-mm_percpu_wq]
root 1080 0.0 0.0 0 0 ? I 19:26 0:00 [kworker/1:0-mm_percpu_wq]
root 1081 0.0 0.0 0 0 ? I 19:29 0:00 [kworker/1:3-ata_sff]
root 1084 0.0 0.0 0 0 ? I 19:30 0:00 [kworker/u8:2-flush-8:0]
root 1087 0.0 0.0 0 0 ? I 19:30 0:00 [kworker/3:0-events]
root 1089 0.0 0.0 0 0 ? I 19:36 0:00 [kworker/1:2-mm_percpu_wq]
root 1095 0.0 0.0 0 0 ? I 19:37 0:00 [kworker/0:0-cgroup_destroy]
joe 1189 0.3 0.1 7896 4352 ? Ss 19:39 0:00 /usr/bin/dbus-daemon --session --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
root 1363 0.0 0.0 0 0 ? I 19:39 0:00 [kworker/2:1-mm_percpu_wq]
root 1364 0.0 0.0 0 0 ? I 19:40 0:00 [kworker/u8:1-events_unbound]
root 1373 0.0 0.0 0 0 ? I 19:40 0:00 [kworker/u8:3]
joe 1379 0.0 0.0 9700 3336 tty5 R+ 19:40 0:00 ps -aux
Logs during this run: https://gist.githubusercontent.com/tryphe/cbfe7fd727180c5a3b93cfeb5dd298a4/raw/66b16c07a34d7470d7b69aea79b9d982a5690e01/gistfile1.txt
There is a reoccurring problem with boost::wrapexcept<boost::bad_weak_ptr>
around the point of failure, but only after some other call in a thread has failed.
I'll build with make debug
and report back with hopefully readable traces :)
I'll try to sync up on a VPS and see if I can reproduce. Just to confirm, you used the master branch?
I'll try to sync up on a VPS and see if I can reproduce. Just to confirm, you used the master branch?
Thanks. Indeed, head is at d562deaaa950979b7a31a441a8f02a00013e26d6
After 10 minutes of running I was able to reproduce the bug in debug mode. 7 blocks sync and then it gets stuck, lots of network chatter but no blocks being added. I gave the hangup signal after 10 minutes to keep the log smaller.
$ cat ~/.bitmonero/bitmonero.log | grep "Your node"
2022-03-04 01:58:25.247 [shortened for readability] Sync data returned a new top block candidate: 2512046 -> 2571962 [Your node is 59916 blocks (2.7 months) behind]
2022-03-04 02:01:49.392 [shortened for readability] Sync data returned a new top block candidate: 2512053 -> 2571966 [Your node is 59913 blocks (2.7 months) behind]
2022-03-04 02:03:42.697 [shortened for readability] Sync data returned a new top block candidate: 2512053 -> 2571968 [Your node is 59915 blocks (2.7 months) behind]
2022-03-04 02:08:20.795 [shortened for readability] Sync data returned a new top block candidate: 2512053 -> 2571970 [Your node is 59917 blocks (2.7 months) behind]
Log with --log-level 1: https://gist.githubusercontent.com/tryphe/bc38a41e752fd127d3092901e1178051/raw/c17ca0b278450c5d29d89014604eb9fe87861525/gistfile1.txt
Please let me know if there is anything else I can provide.
It looks like the errors are possibly unrelated to the syncing problem. I tried a handful of old release versions back as far as 0.17.1.3. I still experience the syncing problem but no runtime errors.
I started the sync process on a fresh VPS now. Will update you if it syncs fine or if I gets stuck too.
8 hours later, so far no issues
Synced 2505056/2575077 (97%, 70021 left)
I unfortunately couldn't reproduce this on a fresh VPS with v0.17.3.0. Sync was without any issues.
I'm not really sure what the differences are between our machines.
While syncing past 95%, it takes me somewhere between 5 to 40 minutes to experience the stuck syncing problem. While fully synced, it takes somewhere between 0 to 5 hours to experience the stuck syncing problem.
I'm on a residential connection so I have some packet loss. About 0.05% to 0.1%. Maybe that's the problem? I'll try and induce more packet loss to see if it makes the problem worse.
Residential connection with minor packet loss should be fine.
The logs you posted did hint at some networking related issues, but it's not exactly clear yet.
I have exactly the same problem on my home pc running latest Arch linux kernel (5.16.14-arch1-1), monerod is launched inside docker container (as per https://sethforprivacy.com/guides/run-a-p2pool-node/). Absolutely the same behavior as described by @tryphe. Will try to sync natively in arch with monero CLI. Overall network download speed according to "sync_info" seems to drop significantly to 3-60kB/s after 3-30mins of monerod container launch after 97% sync (block 2500000).
@Regloom just to confirm, it gets completely stuck? or just slow sync?
After block 2478000 there are no more checkpoints so sync will significantly slow down.
it gets completely stuck? or just slow sync?
It slows down first to very low speed and then it stucks (no new blocks within hours). Restarting monerod helps for a while and then again is slows down and stops syncing. Now trying to sync without docker in Arch itself.
Now trying to sync without docker in Arch itself.
Same behaviour. Here is a log 0 of monerod.
Can you also post sync_info output?
have you tried adding specific peers?
you can add a priority one with
--add-priority-node
176.9.0.187
or
--add-exclusive-node
176.9.0.187
that IP is that of xmrchain.net , so it should work.
Yea my idea was too to add a couple exclusive nodes to make sure it's not a malicious peer.
have you tried adding specific peers?
Thanks for advice! I'll give it a try.
Make sure to add both --add-exclusive-node 88.198.199.23
and --add-exclusive-node 176.9.0.187
, syncing from just one peer can be spotty. The other IP is mine.
Make sure to add both
--add-exclusive-node 88.198.199.23
and--add-exclusive-node 176.9.0.187
, syncing from just one peer can be spotty. The other IP is mine.
Executed monerod with --add-exclusive-node=88.198.199.23:18081 --add-exclusive-node=176.9.0.187:18081
Now it shows me that I'm fully synced:
~/monero $ ./monero-x86_64-linux-gnu-v0.17.3.0/monerod status 2022-03-17 20:11:59.127 I Monero 'Oxygen Orion' (v0.17.3.0-release) Height: 2507075/2507075 (100.0%) on mainnet, not mining, net hash 2.65 GH/s, v14, 0(out)+0(in) connections, uptime 0d 0h 8m 4s
is it correct?
That doesn't sound right.
Is it a node with restricted RPC?
That doesn't sound right.
Is it a node with restricted RPC?
Well, I guess so. Here is my current config. I've tried several other options (commented) but no luck.
The first step would be to try to sync without any config at all, just a custom data dir if necessary. Also adding --prune-blockchain
is fine if you don't have enough space for the full blockchain, but apart from that no config.
OK, here is sync_info logs after night running: monerod --prune-blockchain --add-priority-node=88.198.199.23 --add-priority-node=176.9.0.187 It stuck on Height: 2507351/2581844 (97.1%).
I've also tried --add-exclusive-node
and it blocked both nodes after 20 mins:
2022-03-18 06:04:56.330 I SYNCHRONIZATION started 2022-03-18 06:05:09.986 I Host 88.198.199.23 blocked. 2022-03-18 06:08:12.024 W monerod is now disconnected from the network 2022-03-18 06:08:12.978 I [176.9.0.187:18080 OUT] Sync data returned a new top block candidate: 2507361 -> 2582080 [Your node is 74719 blocks (3.4 months) behind] 2022-03-18 06:08:12.978 I SYNCHRONIZATION started 2022-03-18 06:08:24.067 W monerod is now disconnected from the network 2022-03-18 06:08:24.747 I [176.9.0.187:18080 OUT] Sync data returned a new top block candidate: 2507371 -> 2582080 [Your node is 74709 blocks (3.4 months) behind] 2022-03-18 06:08:24.747 I SYNCHRONIZATION started 2022-03-18 06:08:35.676 I Host 176.9.0.187 blocked.
@Regloom can you also upload log level 2 logs? You don't have to add any priority nodes.
can you also upload log level 2 logs?
Here you are, level 2 logs without priority nodes. bitmonero.log.zip
p.s. Also tried running from latest Debian Live image and got the same result. At least distro is not the cause. Can it be due to my hardware (e.g. AMD Athlon II X4 6400e CPU with no L3 cache and AES support)?
Fwiw, I'm also running on an AMD FX 8300 series, but it's a different socket (AM3+ vs AM3 for Athlon II) and different die size (32nm vs 45nm for Athlon II).
AMD FX 8300 series,
That might be somehow related. What's your motherboard?
AMD FX 8300 series,
That might be somehow related. What's your motherboard?
I think it's this thing: ASUS TUF SABERTOOTH 990FX
AMD 990FX chipset.
I'll move the problematic VM to a newer Intel hypervisor to attempt to rule out any hardware issue. Will try to report back in a few days.
Describe the problem
Behavior: At some point after starting monerod, ranging from a few minutes to many hours, syncing "gets stuck". Whether synced or unsynced, blocks stop being added to the chain until monerod is restarted. The monerod process continues running and is responsive, however, and exits normally if given HUP etc.
The behavior repeatedly appears for me, in particular, starting around block >~2450000. Not any specific block, just random, but not at any point before ~2450000. If monero is already synced, it takes longer for the behavior to appear, sometimes several hours longer.
There are plenty of connections and new connections being made during the behavior. Monerod gives me two messages like this every few minutes, gives the correct number of total blocks, but is still stuck at the same block where the behavior appeared:
Workaround: If I create a cron script to restart monerod every ~15 minutes, it eventually syncs.
A few things about my node and what I've tried:
I've been running monero for over 5 years with no syncing issues, until about a month ago. Minimal setup Debian 10 VM with ntpd and monerod (Update: tried with a new Debian 11 machine and the same behavior occurs). Syncing works fine until about 95%, or near block 2450000. Removed the entire .bitmonero directory and synced from scratch several times with the same result. Removed my config and am running with no arguments to reduce complexity, but get the same result. Running on clearnet. Using the blocklist at https://gui.xmr.pm/files/block.txt, but I tried without the blocklist and get the same result. Using outbound connections only to try and sync currently to avoid possibility of attack. Tried running with things like
--out-peers 25
or--out-peers 50
but get the same result. Running the latest master branch 9aab19f349433687c7aaf2c1cbc5751e5912c0aa but also tried with signed releases:which experience the same behavior but do NOT show errors in the log.
Update: I originally thought the syncing problem was related to these errors, but this is not the case. Will leave this section here
Some errors/warnings that frequently occur:
There's also a loop that produces hundreds of these at once for the same peer:
Here's a short run: https://gist.githubusercontent.com/tryphe/0a09526e0c9aa1e6154b4e2b0c6e2e13/raw/a56cca8895370d6e91a27b5f4a4b887d4e064d2d/gistfile1.txt (Note: I gave the hangup signal near 03:16:52.049 where it shuts down cleanly)
Any feedback is welcome and appreciated. Let me know if you need more information or if I can do anything to make debugging easier. Thank you.