shermand100 / PiNodeXMR

Monero Node for Single Board Computers with Web Interface and additional tools pre-configured. Self Installing.
GNU General Public License v3.0
213 stars 41 forks source link

Unable to sync / obtain new blocks #48

Closed nkinnan closed 2 years ago

nkinnan commented 2 years ago

I do not think this is related to https://github.com/monero-ecosystem/PiNode-XMR/issues/44 as I am not seeing any "recalculating difficulties" messages, but my expertise is minimal.

Based on my limited understanding and perusal of UI/logs:

I'm unsure how to get it "unstuck". bitmonero.log. Restart happens at 2022-01-13 19:18:44.571 if you would like to see startup behavior, logs before that are the tail end of being online for 12 hours. Any help would be appreciated! Please let me know if I can provide additional info / files. I would be happy to follow any debug steps provided.

SSD filesystem is EXT4, blockchain was copied with SFTP, node was not running during copy, desktop daemon was shut down before copy.

shermand100 commented 2 years ago

I'm still going through that log, nothing is jumping out yet. But can you do quick check that the file/folder permissions of the ./bitmonero, lmdb and data.mdb are for user "pinodexmr" and are writeable? Sometimes a copy from another system can inherit old source permissions.

It's easy to see visually with that winscp program, or just use:

sudo chown pinodexmr -R ./bitmonero && sudo chmod 755 -R ./bitmonero to set read/write

nkinnan commented 2 years ago

Here are the permissions / ownership:

pinodexmr@PiNodeXMR:~ $ ls -la drwxrwxrwx 3 pinodexmr root 4096 Jan 13 10:28 .bitmonero

pinodexmr@PiNodeXMR:~ $ ls -la .bitmonero -rwxrwxrwx 1 pinodexmr pinodexmr 3393161 Jan 13 13:35 bitmonero.log drwxrwxrwx 2 pinodexmr pinodexmr 4096 Jan 13 11:18 lmdb -rwxrwxrwx 1 pinodexmr pinodexmr 149134 Jan 13 13:10 p2pstate.bin

pinodexmr@PiNodeXMR:~ $ ls -la .bitmonero/lmdb -rwxrwxrwx 1 pinodexmr pinodexmr 130254635008 Jan 13 11:36 data.mdb -rwxrwxrwx 1 pinodexmr pinodexmr 8192 Jan 13 13:35 lock.mdb

After executing the command given, write actually seems to be removed??? (edit: for other users, never mind, linux isn't my thing, I still changed it back to 777 just to be "safe")

pinodexmr@PiNodeXMR:~ $ ls -la total 500 drwxr-xr-x 3 pinodexmr root 4096 Jan 13 10:28 .bitmonero

I have reverted it to 777 to be safe, but this doesn't seem to have been the problem. The folder was created by PiNodeXMR when starting the node up, I then "rm -rf" everything inside and re-ran the mount usb storage command, formatting as EXT4 per https://github.com/monero-ecosystem/PiNode-XMR/issues/15 but am unsure if anything would be different with 06a236c

nkinnan commented 2 years ago

I'll also note that I copied the entire contents of .bitmonero over from desktop after syncing, not just lmdb or data.mdb, in case that matters.

shermand100 commented 2 years ago

You say you've removed everything anyway but the next thing I'd do is work out if this is a network or blockchain issue.

To do that just see whether it'll do a sync from 0, (just to see if it can get some blocks, we won't actually sync all the way from 0).

The easiest way is to just hide by renaming the current lmdb folder to something like "lmdb_old". Monerod won't be able to find it and so will sync from 0 again in a new lmdb folder. sudo mv ./bitmoner/lmdb ./bitmoner/lmdb_old

If it can start syncing blocks again from 0 then it's showing perhaps an issue with the blockchain (if it's not re-calculating difficulties or pruning). If it can't sync from 0 then that would indicate a network issue. The first step to solve a network issue is to delete the p2pstate file in the .bitmonero directory (this is a list of peers and banned peers, probably best to let you monero make it's own list than import one from your PC)

nkinnan commented 2 years ago

So I tried a couple things.

First, nuke all contents of .bitmonero and sync from scratch. That seemed to work and I got a couple thousand blocks before shutting it off.

Next, re-nuke all contents of .bitmonero and re-copy the blockchain from my desktop via sftp to the lmdb folder. This time that is the only file I copied, .bitmonero and lmdb folders were empty except for data.mdb. It took about 20 minutes to finish processing the blockchain which looked good in the logs and then start initiating connections. Again, it can see a new top block candidate, understands it is behind, and attempts to sync, but again, nothing happens and it remains stuck on block 2535459 which is the last block synced using the desktop software. I let it run for an hour before giving up. Here is the log from that run, and the file listing: bitmonero_stuck.log

pinodexmr@PiNodeXMR:~/.bitmonero/lmdb $ ls -la -rw-r--r-- 1 pinodexmr pinodexmr 130254635008 Jan 14 20:13 data.mdb <- copied from desktop -rw-r--r-- 1 pinodexmr pinodexmr 8192 Jan 14 20:23 lock.mdb <- created by pinodexmr

Next, just to verify my blockchain isn't corrupted (even though all checks appear to pass on pinodexmr according to the logs) I restarted the desktop wallet software (monero-gui-v0.17.3.0) and let it sync which it did successfully. It had no issues working with the blockchain database and synced it to be up to date.

Next, since I didn't save the log the first time I did a from-scratch sync, I nuked .bitmonero folder a third time and started up the node, here's that log showing it successfully downloading blocks: bitmonero_fromscratch.log

Finally, I shut down the desktop wallet (stopping the node) after it had synced fully, copied the updated version of data.mdb (only) over for a third time, using a different transfer dialog and explicitly selecting binary (just in case), and tried starting the node on the pi again. Once again, it does not grab any new blocks. Here's that log: bitmonero_stillnotsyncing.log

So:

Of course all copies happened while the nodes were shut down and were done over sftp with FileZilla. I'm still at a loss here what to do, but hopefully this gives you some more information to go on? I can let it sync from scratch but that'll take weeks. I'm obtaining the blockchain from the official monero core wallet.

Is there anything else I can do to help debug?

nkinnan commented 2 years ago

So I have a second pi, sd card, ssd, etc and tried installing on that for a completely fresh start to get more data. Absolutely zero in common with the previous install, completely new hardware and a new install of raspbian. Here are the steps I took:


sudo apt-get update
sudo apt-get upgrade
sudo apt full-upgrade

sudo apt-get install xrdp
sudo apt-get install gparted

sudo rm -rf LCD-show
git clone https://github.com/goodtft/LCD-show.git
chmod -R 755 LCD-show
cd LCD-show
sudo ./MPI4008-show

<installing the touchscreen drivers reboots the pi>

wget -O - https://raw.githubusercontent.com/monero-ecosystem/PiNode-XMR/master/Install-PiNode-XMR.sh | bash

<finish setup, no errors AFAICT>
<set passwords>
<run option 3 "Update Tools", run each of the 4 sub-options, only blockchain explorer needed an update 
though oddly it didn't seem to work the first time and just rebooted, the second attempt succeeded>

<run the usb storage tool to mount .bitmonero formatted as EXT4>
<I can confirm .bitmonero folder was created on this install so mounting the SSD worked as expected>

sudo reboot now

cd .bitmonero
rm -rf lost+found
<no idea why lost+found is created after formatting the USB SSD as EXT4 but I assume this is normal???>
mkdir lmdb
sudo chown pinodexmr -R ./lmdb && sudo chmod 755 -R ./lmdb

<confirm the node is shut down and copy data.mdb to lmdb via sftp>

<start in private mode with rpc password>

After 7.5 hours, I was able to download 40 blocks. It is technically working, but I went from being 5.7 hours behind to 8.2 hours behind over a 7.5 hour period.

One odd thing I noticed in the logs is that it is re-running its checkpoints, outputting "CHECKPOINT PASSED FOR HEIGHT 1" through "CHECKPOINT PASSED FOR HEIGHT 2478000", a total of 16 times over the 7.5 hours. That's not normal is it? I can't imagine that it is, or what would be causing it.

This time I also noticed that intermittently the sync status on the web UI just listed a date and did not refresh (this happened for several hours at least) and that the network peers (full) list is currently displaying "Error: Couldn't retrieve peer list-- rpc_request:" and then what appears to be a help command output.

I don't really know what I'm looking for in these logs but the re-checkpointing seems sus to me. Perhaps you will be able to see something that I am not.
bitmonero_newhardware_startup.log

edit: it's not able to display the transaction pool stats either, giving the same error as the full peer list. And as before, the SSD light blinks continuously, it never stops accessing the SSD or the SD card. That also strikes me as sus and was happening on the previous hardware/install as well. Perhaps it's just constantly re-running the checkpoint routine, or the service is restarting or something, and doesn't have time in-between all that to actually sync?

nkinnan commented 2 years ago

Some additional bits of info:

I'm grasping at straws at this point but going to reboot the pi (using the safe shutdown option in the UX) on the chance this might improve things?

nkinnan commented 2 years ago

Potential bug: I have a 4GB pi 4, and I believe your installer left the 2GB swap file enabled. After disabling it in the settings UX, I stopped seeing continuous SD card access, though the SSD light is still constantly on. This should help with SD longevity.

nkinnan commented 2 years ago

It's not a disk / enclosure driver issue:

pinodexmr@PiNodeXMR:~ $ dd if=/dev/zero of=./speedTestFile bs=20M count=5 oflag=direct 5+0 records in 5+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 3.47191 s, 30.2 MB/s pinodexmr@PiNodeXMR:~ $ dd if=./speedTestFile of=/dev/zero bs=20M count=5 oflag=dsync 5+0 records in 5+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 2.25856 s, 46.4 MB/s

pinodexmr@PiNodeXMR:~ $ dd if=/dev/zero of=./.bitmonero/lmdb/speedTestFile bs=20M count=5 oflag=direct 5+0 records in 5+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 0.514204 s, 204 MB/s pinodexmr@PiNodeXMR:~ $ dd if=./.bitmonero/lmdb/speedTestFile of=/dev/zero bs=20M count=5 oflag=dsync 5+0 records in 5+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 0.460517 s, 228 MB/s

nkinnan commented 2 years ago

I filed this with the main monero project, perhaps they can help but I don't even know if the constant re-checkpointing is the root of the problem yet. https://github.com/monero-project/monero/issues/8142

shermand100 commented 2 years ago

I'm thinking connection. Is there some kind of advanced or custom firewall you have on your network?

Log entries of note from the newhardware_log...

2022-01-15 11:05:22.140 90ef8440 INFO net.p2p src/p2p/net_node.inl:791 dns_threads[0] addr_str: seeds.moneroseeds.se number of results: 0 2022-01-15 11:05:22.290 90df7440 INFO net.p2p src/p2p/net_node.inl:791 dns_threads[1] addr_str: seeds.moneroseeds.ae.org number of results: 0 2022-01-15 11:05:22.342 90cf6440 INFO net.p2p src/p2p/net_node.inl:791 dns_threads[2] addr_str: seeds.moneroseeds.ch number of results: 0 2022-01-15 11:05:22.613 90bf5440 INFO net.p2p src/p2p/net_node.inl:791 dns_threads[3] addr_str: seeds.moneroseeds.li number of results: 0 2022-01-15 11:05:22.613 [P2P0] INFO net.p2p src/p2p/net_node.inl:829 DNS seed node lookup either timed out or failed, falling back to defaults 2022-01-15 11:05:22.667 [P2P0] INFO net.p2p src/p2p/net_node.inl:2699 [192.110.160.146:18080 987970a6-07e1-48de-a948-15af88442eca OUT] NEW CONNECTION 2022-01-15 11:05:27.393 [P2P0] INFO net.p2p.traffic contrib/epee/include/storages/levin_abstract_invoke2.h:44 [192.110.160.146:18080 OUT] 243 bytes sent for category command-1001 initiated by us 2022-01-15 11:05:27.396 [P2P1] WARNING net.p2p.tx src/cryptonote_protocol/levin_notify.cpp:465 Unable to send transaction(s), no available connections

2022-01-15 11:05:27.563 [P2P0] INFO net.p2p src/p2p/net_node.inl:1676 No available peer in white list filtered by 4

2022-01-15 11:05:35.617 [P2P5] INFO net.cn src/cryptonote_protocol/cryptonote_protocol_handler.inl:2907 Target height decreasing from 2537711 to 0 2022-01-15 11:05:35.617 [P2P5] WARNING global src/cryptonote_protocol/cryptonote_protocol_handler.inl:2911 monerod is now disconnected from the network 2022-01-15 11:05:35.617 [P2P5] INFO net.cn src/cryptonote_protocol/cryptonote_protocol_handler.inl:2917 [98.255.106.127:18080 OUT] [0] state: closed in state synchronizing

And each of the...

2022-01-15 13:26:50.916 [P2P2] INFO checkpoints src/checkpoints/checkpoints.cpp:260 Blockchain checkpoints file not found

All 16 entries are from P2P2 (peer2peer connection 2, the same person).

So here's what we'll do. Delete the p2pstate.bin file, so removing all "bad" peers monerod knows of. And we'll specify our own good one as connection to the hard coded seed-peers according to your log isn't working.

Depending on how you start the node (which button you click on the Web-UI) you can add in a seed peer manually with --seed-node=pinodexmr.hopto.org:18080 So for example if you use the Private clearnet mode add the --seed-node option into line 26 of /home/pinodexmr/monerod-start.sh That's my own node that currently is up-to-date and has 12 outgoing and 84 incoming connections (so a healthy peer list) Reference https://monerodocs.org/interacting/monerod-reference/#p2p-network

This is of course based on you not having a customised and restrictive firewall on your network. Monero does usually work well with UPnP

nkinnan commented 2 years ago

My pi is connected via physical cable ethernet and the only thing between it and the cable modem (which is in bridge mode so no comcast fuckery is going on there) is a dd-wrt flashed wireless router which I believe does support upnp, though I have not seen any incoming connections ever made to the pinode. I can try forwarding the port manually if you think it would help. I'm not experiencing any connectivity issues on the rest of my network, and my internet speed tests come back good.

I currently have 4 peers connected, with three in synchronizing state. White Peerlist Size: 437 Grey Peerlist Size: 5000 and I have been running this last install for 12ish hours. All the network settings in the UI have been maxed out.

I'm trying your suggestion to use your own node as a seed, I've edited monerod-start.sh line 26 appending the --seed option like so:

./monerod --rpc-bind-ip=$DEVICE_IP --rpc-bind-port=$MONERO_PORT --confirm-external-bind --rpc-login=$RPCu:$RPCp --rpc-ssl disabled --in-peers=$IN_PEERS --out-peers=$OUT_PEERS --limit-rate-up=$LIMIT_RATE_UP --limit-rate-down=$LIMIT_RATE_DOWN --max-log-file-size=10485760 --log-level=1 --max-log-files=1 --pidfile /home/pinodexmr/monero/build/release/bin/monerod.pid --enable-dns-blocklist --detach --seed-node=pinodexmr.hopto.org:18080

Then stopped the service, deleted p2pstate.bin, and restarted the node in the UX as a clearnet private node. It usually takes 20 or 30 minutes to finish checkpointing and get going, will keep an eye on it and let you know.

nkinnan commented 2 years ago

1 hour, Busy Syncing: false, 4 connections two of them in state synchronizing, no blocks added bitmonero_seedoption.log

shermand100 commented 2 years ago

I think you're good. You've got good peers, (you can remove the --seed-node option if you want as p2pstate will be populated). Can you just leave it overnight? It's recalculating diffs for those last blocks on the chain, but will take some time. Then it can resume adding the new ones.

2022-01-15 22:41:12.060 [P2P1] INFO global src/cryptonote_core/blockchain.cpp:998 Recalculating difficulties from height 2478000 to height 2537659

nkinnan commented 2 years ago

I did another reset right after my previous comment, but it's been another 12 hours since then and I only advanced 20 blocks, and that happened close to 8 or 10 hours ago, before I went to bed, with nothing since.

I found that upnp was disabled on my router, and after enabling that I am now getting incoming connections as well as outgoing. Here are some current stats:

  Monero Version: 0.17.3.0-release
  Node Status: OK
  Busy Syncing: false
  Current Sync Height: 2537680
  P2P Outgoing Connections: 4
  P2P Incoming Connections: 4
  RPC Connections Count: 1
  Network Type: mainnet
  TX Pool Size: 0
  White Peerlist Size: 1000
  Grey Peerlist Size: 4995
  Update Available: false

The connections cycle through quite a bit, but there's always at least one or two in state "synchronizing" (that show the actual current block height) as well as a few "normal" that are behind and presumably I am feeding, the rest being "before_handshake". It's built up quite a whitelist by this point.

First 4 hours of logs after restart: bitmonero_startup.log Last 5 hours of logs: bitmonero_latest.log

nkinnan commented 2 years ago

20 hours total uninterrupted, another 20 blocks (40 total now)

Monero Version: 0.17.3.0-release
  Node Status: OK
  Busy Syncing: false
  Current Sync Height: 2537680
  P2P Outgoing Connections: 2
  P2P Incoming Connections: 7
  RPC Connections Count: 1
  Network Type: mainnet
  TX Pool Size: 0
  White Peerlist Size: 1000
  Grey Peerlist Size: 4999
  Update Available: false

4 of those nodes are in state synchronizing, two in, two out, plus more in "normal" and "before_handshake" bitmonero_20hrs_tail.log

nkinnan commented 2 years ago

After days of banging my head against the wall trying to solve this, out of desperation I tried the steps here https://forums.raspberrypi.com/viewtopic.php?f=28&t=245931 to downgrade from UAS to mass-storage protocol for the external SSD enclosure.

Although I had no dmesg lines indicating problems with UAS, I am now syncing > 10 blocks per minute, the SSD enclosure light is no longer constantly lit up, I have 3x the number of peers (this must have been seriously hurting my perf in general, not just disk speed), and the monero log file is filling up perhaps 50x faster than before, and checkpointing takes < 1 second instead of 20.

Hopefully this helps someone else.

shermand100 commented 2 years ago

That is infuriating. Perhaps more so because that the exact link you have already mentioned in our hardware Wiki but I'd say your symptoms were considerably different to the usual unexpected unmounting of the drive. It's a shame this error is somewhat silent and clearly displays itself differently between OS's. Glad you've got it working now and I've learnt too of this way it presents itself.

nkinnan commented 2 years ago

Believe it or not, that's not the worst of it! I purchased this enclosure and SSD specifically since they are mentioned as compatible by the RaspiBlitz (a bitcoin pi node) team, and in fact their site has direct amazon referral links for it at https://github.com/rootzoll/raspiblitz

Plus, here is my lsusb output:

Bus 002 Device 002: ID 174c:1153 ASMedia Technology Inc. ASM1153 SATA 3Gb/s bridge

The ASM1153 is mentioned as "a really common one that works well with the Pi" by the good/bad adapter list https://jamesachambers.com/raspberry-pi-4-usb-boot-config-guide-for-ssd-flash-drives/ which is referenced by PiNodeXMR's compatability section! https://github.com/monero-ecosystem/PiNode-XMR/wiki/Hardware#considerations

So I don't know how or why this is happening, maybe they spun a new revision of the silicon or I got a "fake" rebranded one or something? I only tried downgrading the protocol because nothing else worked. It may be worthwhile to add a note about this in the documentation, I did try two completely different enclosures (purchased at the same time) and pi boards, so it's not a one-off in that sense.

In any case I appreciate your help in trying to debug the problem and will throw a tip your way for maintaining this project for the community. Thanks!