shermand100 / PiNodeXMR

Monero Node for Single Board Computers with Web Interface and additional tools pre-configured. Self Installing.
GNU General Public License v3.0
213 stars 41 forks source link

My node is stuck at a past block height, with the Busy Syncing flag as false, I cannot find a way to force it to sync #44

Closed bdkappel closed 2 years ago

bdkappel commented 3 years ago

My node on a Raspberry Pi 4 (running off an M.2 SSD) is stuck at the block height from the imported blockchain. I synced it on my PC with the GUI wallet, copied the lmdb folder to a flash drive, copied it to the .bitmonero location on the Pi. I've tried rebooting the node, stopping the node, the big red danger button at the bottom, I've even wiped the SSD and tried again from a fresh Raspberry Pi OS image.

Is there any way to force the node to sync? I have looked everywhere, the one suggestion I found to delete the p2pstate.bin file did not help. I couldn't find anything in the monerod documentation that was any help either. I was hoping to avoid having to sync the entire blockchain on the Pi.

Any help would be appreciated. Thank you in advance.

shermand100 commented 2 years ago

Hi, sorry for the mega late response to this. Have you had success in continuing to sync?

bdkappel commented 2 years ago

Hello. No worries at all, I've been meaning to follow up with you but I've been super busy. I am still having no luck. In fact, I am not able to sync the blockchain from the start either. It appears to sync fine for a while, a day or two, but then suddenly reverts to a false Busy Syncing flag. Two logs are attached below, the numbers in the names are the block height that it stopped syncing. I'm still seeing connection issues.

I think we've determined it's most likely a hardware issue, but I'm not sure where to go from here. I am connected to Ethernet now, so that point of failure is taken care of. I have copied the SSD to an external SSD and have started a new sync to see if the current SSD is bad for some reason, it was brand new. From there, I can try another Pi, but this was the last one "laying around" and it was functioning perfectly up until being repurposed so I'm not sure what would have changed. Would this correspond to a failing or overloaded router or modem?

Any ideas? Thanks for following up!

176700_bitmonero.log 178276_bitmonero.log

shermand100 commented 2 years ago

I've had two users recently who when faced with what looked like connection issues in the log actually was due to a corrupted blockchain. Can you stop the node and then just rename the blockchain folder...

If you rename the folder holding the blockchain to "lmdb_old" or something, then Monero won't be able to find it and will sync from scratch. If it is able to start syncing from scratch then it is likely not a connection issue as it's syncing again.

Also by just renaming the directory if it's not a corrupted blockchain you can just re-name it back to "lmdb" again and Monero will be able to carry on where you left off again, so no progress lost.

bdkappel commented 2 years ago

I've tried that several times actually. The original problem occurred when trying to import a blockchain file from my PC, so I tried syncing from the beginning on the Pi as well. It will reach somewhere between blocks 175000 and 190000 before stopping and showing the connection errors. I think I've tried syncing 4 different imported blockchains and started from the beginning on the Pi 6 times, with similar results each time.

So far I have plunged the Pi directly into Ethernet, but that did not make a difference. I also tried a different SSD, it synced further than previous attempts but still stops eventually, around block 190000. I am not noticing any other network issues around the house, especially not that sync up with when the node stops.

shermand100 commented 2 years ago

Despite what the log says about connection issues I do really think it's just the blockchain. Can you turn your PC node on again just to progress the blockchain further. Turn off/on/restart the PC so there is no way that Monerod on your PC is running then copy onto the PiNodeXMR again. If monerod was running in the background whilst copying it would have sent it corrupted as it was writing at the time.

bdkappel commented 2 years ago

Noted. I'll give that a try when time permits, looking like maybe this weekend. I think I tried that before, someone else had suggested the same, but I'll verify again and see.

However, that wouldn't explain the issues with syncing the blockchain entirely from the start on the Pi would it? The two logs above did not have an imported blockchain, but still showed the same errors.

ChiefGyk3D commented 2 years ago

Can I ask how you have the M.2 connected to the Pi as it doesn't have PCI-E, I presume it's over USB 3.0? Any adapters or such and info would be most helpful.

On another note: @shermand100 I wonder if the Pi itself in some configurations is maybe more sensitive to flipping bits and corrupting the chain. While statistically rare, it's happening a bit too often.

It also may be worth looking into changing file system types for external storage formats as Ext4 is native to Linux and already recommended and less conducive to having errors as it is a journaling file system.

UDF, while compatible everywhere, is honestly not ideal for database operations, which is what we are doing essentially. At the very least we should be giving the users a choice in setup.

I have been using a custom config of PiNodeXMR using Ext4 even when I was on a Pi4 and noticed benefits.

I was already planning some possible commits after testing but the supply chain issue is delaying my order of the second RockPro64 I was going to use for testing for this since my own node is a clobbered mess right now patch wise, and needs to be recreated cleanly.

bdkappel commented 2 years ago

Finishing up syncing the blockchain on my PC and I'll try the transfer again.

@ChiefGyk3D I have it connected through USB 3.0. I'm using the Argon One M.2 Case with their M.2/USB 3.0 adapter. I have also tried using a Samsung T5 External SSD to see if the M.2 drive or the adapter was faulty, but I get the same errors.

bdkappel commented 2 years ago

Ok, so I resynced the blockchain on my PC, through the Monero GUI wallet. I then closed the wallet, waited for the daemon to shut down and the wallet GUI to close. I checked tasklist in the command prompt to make sure there was no instance of monerod running. I restarted the PC, once again checked tasklist to verify no process was starting up. I then copy the blockchain over, this time using WinSCP, before I just copied to an external SSD and copied over.

I get the same/similar errors. It shows connection issues. I'm hard-wired in with good speeds so I don't understand the issue at all. I've attached the most recent log below.

Would a new Pi be worth trying? I can't determine if that is the issue. A different hard drive, and not using the M.2 adapter did not seem to make a difference either. Would it be more recommended to look into a Rock Pro 64? I've seen that mentioned around the ecosystem before. Thanks everyone for the help!

afterImport.log

shermand100 commented 2 years ago

@bdkappel I would have thought that if there were a hardware issue with the Pi (memory issues or something) then the intensive task of compiling Monero from source would have triggered a fail too at the build stage. But this could be checked. At the moment we build Monero from source, but only because it adds library files and links needed for the block explorer.

There are ARMv7 Monero binaries already pre-built by the devs for the Raspberry Pi (so tested and known to work ) at the usual official get monero page.

It would be very quick for you to download their Monerod and replace yours that is currently found at /home/pinodexmr/monero/build/release/bin/monerod I am fairly sure that with the steps you took preparing the blockchain it's integrity is good. I would now question Monerod but it's a quick thing to check, just download and unpack.

bdkappel commented 2 years ago

@shermand100 Tried your suggestion of downloading and copying over the ARMv7 binary and I'm getting the same/similar error messages. So this points to something other than an issue building moneord from the source, correct? I've attached the log in case it helps.

afterNewMonerod.log

shermand100 commented 2 years ago

Sorry I didn't see this sooner bud, but seeing the log so recently after starting Monerod has highlighted the issue... and there sort of isn't one, at least not with hardware or software:

Is your blockchain pruned? There are a few references to pruning in the logs, but as I've not run a pruned node I'm not certain if the reference is to your node or a peer.

But either way your node is just "Recalculating difficulties" (line 2402 of "afterNewMonero.log" ) and has been doing the same but probably got interrupted since the previous log "afterImport.log" line 10722.

Once it's completed this it'll continue syncing, all the cant connects are because the node hasn't "started" yet because it's still initiallising. I hope that process doesn't take too long.

No need to buy new hardware at least.

bdkappel commented 2 years ago

Interesting. I am not running a pruned node, at least not that I was aware of. The blockchain I've imported is roughly 40 GB in size, which does line up with a pruned blockchain in size. I'm using the Monero GUI wallet on my PC, and I'm not seeing any options or verification of whether I'm running a pruned node or not, it's been a while since installation but I don't think I would have selected a pruned node. Anyways, if that's the case I may uninstall the GUI wallet and reinstall to try to get a full, non-pruned blockchain at some point since you can't "un-prune" a blockchain.

Would importing a pruned node have an effect on syncing? Or I guess possibly be causing the issues? I feel like I have left it running for several days before and it never started syncing. I have started it up again and I'll monitor it for a week or so and see if anything improves.

Thanks again, I'll keep you updated.

shermand100 commented 2 years ago

As it was "Recalculating difficulties from height 2430000 to height 2490548" which is pretty much correct for the top block but only 40GB (rather than >102GB) I'd say that's a pruned blockchain.

A pruned blockchain doesn't have much effect on the current sync. I think* the top 5500 blocks of the chain are not pruned anyway. These most recent blocks are held just in case the chain splits and a big reorganisation is needed to get back on the longest blockchain.

Now we know what to look for, in the older 176700_bitmonero.log:

<line 12634> 2021-09-10 00:50:37.365 [P2P4] INFO global src/cryptonote_core/blockchain.cpp:998 Recalculating difficulties from height 80000 to height 176699

So since that time stamp 2021-09-10 it has progressed through the entire blockchain and is now doing its calculations for blocks >2430000. It's unfortunate it's been doing this, and I need to look into what triggers it so it can be prevented in future as these single board computers really aren't powerful enough to do this level of computation in a practical period of time. I don't know what'll be faster... do a fresh sync on a pc to get the whole blockchain and copy it over and hope it doesn't trigger another difficulty calculation ( I don't think I've ever had to do one), or carry on as you are as you've only got the last 30,000 blocks or so to calculate. But I couldn't estimate how long that'll take.

bdkappel commented 2 years ago

Hey! Sorry for taking so long to check-in. I let the node run for a little over three weeks, and it was still in the same state as before. Here is the log after those three weeks. afterThreeWeeks_bitmonero.log

Since we determined I was importing a pruned blockchain, I deleted the blockchain from my PC and reinstalled the Monero GUI wallet. Once synced, the blockchain was around 110 GB which appears to be correct for a full, non-pruned blockchain. Exact same issues and errors as before after importing. This was after properly shutting down the node on my PC, restarting, and verifying no instance of monerod was running before copying the blockchain over. Here's the log from after that. afterNewImport_bitmonero.log

Finally, I think I can determine that the issue is my hardware. I have tried to repurpose the Pi. I completely erased and reinstalled Raspberry Pi OS and even running apt upgrade gives Connection timed out and Connection reset by peer... errors. I can rerun the same command and it may possibly go through. I can't tell what is going on here, but several other Raspberry Pi's in the same room on both WiFi and Ethernet do not have the same issue as this specific Pi. Do you agree this seems like the only plausible explanation?

Instead of buying another Raspberry Pi for this purpose, would the ROCKPro64 be the way to go for this?

shermand100 commented 2 years ago

It's sounding like a likely possibility. I've never had a Pi board properly die on me (hardware), but the steps you've taken sound logical. To save a possible unnecessary expense I would just ask that if you have the option to then try this PiNodeXMR project on one of those other Pi's temporarily that would then confirm your suspicions.

As for those logs there are the occasional connections getting through, then dropping. I can't tell if those are exclusively the hard coded nodes or ones it's managed to find peer-to-peer. Either way it still seems aware of new transactions and new block height, but not doing anything about it.

If you do end up buying new hardware then the Rock64 will do the job much faster than the Pi4. The RockPro64 is then I've heard 3x faster again, plus has the PCIe slot which users here have used for faster storage. It really comes down to budget. As it sounds like this is a hobby to you (having multiple Pis) you may find merit in the RockPro64 for other purposes. It would probably handle being this Monero node and other tasks simultaneously.

bdkappel commented 2 years ago

Actually, yes, I do have the capability to try another Pi temporarily. I think I'll have time to do that today, I'll let you know if I confirm our suspicions. The logs do seem to match my observations of the network just dying intermittently I think.

I'm definitely a Pi/ESP/ODroid hobbyist so I think adding a RockPro64 will be a great natural addition to my arsenal regardless of its eventual use. I shouldn't care this much about getting an XMR node up and running since it's not exactly a necessity, but the issues have kept me intrigued.

nkinnan commented 2 years ago

If you're still debugging, give this a try: https://forums.raspberrypi.com/viewtopic.php?f=28&t=245931

I see several things in your logs that lead me to believe it might be the same root cause as my own issue #48

bdkappel commented 2 years ago

Hey, sorry I've gone so long without dropping in with an update.

Thanks, @nkinnan for the advice. I tried the steps outlined in your link, but I still get the same errors. Your post did make me wonder if the SSD I have may be bad instead of the Pi itself, but the only other drive I had to try it on was a Samsung T5 and it had the same issues.

I think at this point we can definitely determine that I have some kind of hardware issue causing this. I don't have much time at the moment to debug, so I might have to cut my losses at this time. I do think I will buy a RockPro64 and try that out to at least attempt to get the software running and a node operational, I mean, I'm this deep already I might as well continue.

I'm going to close this issue as being a hardware issue, if I do manage to find the time to debug further and find a solution I will definitely drop in.