near / stakewars-iii

Stake Wars: Episode 3 challenges and place to report issues
87 stars 177 forks source link

Syncing never finishes #74

Open IaroslavTitov opened 2 years ago

IaroslavTitov commented 2 years ago

The issue

The neard service runs and syncs successfully, but never finishes syncing. (See attached logs, gets stuck at 99%+) My contract is up and pinging successfully. It has enough delegated near to be picked at validator, but I'm getting 0 uptime because node won't sync. When I grab curl -s http://127.0.0.1:3030/status - status is syncing

Staking contract - tias.factory.shardnet.near

image

Troubleshooting

I've tried the following list of suggestions from @NodeRunner in discord:

1. Hardware and Internet meet min specs? 
sudo apt install speedtest-cli && speedtest-cli

https://www.vpsbenchmarks.com/
2. Firewall port 24567 open in OS and/or host?
3. Compiled nearcore with the recommended commit?
4. NEAR_ENV set to shardnet? 
export NEAR_ENV=shardnet

5. Wallet key and validator key match? 
near view xxx.factory.shardnet.near get_staking_key '{}' && cat ~/.near/validator_key.json | grep public_key

6. Pinging at least once per epoch and shows up in explorer? https://explorer.shardnet.near.org/accounts/xxx.factory.shardnet.near
7. Total pool stake amount is above minimum (50) & seat price?  
near validators next | grep "seat price"
 Need more NEAR? https://discord.com/channels/490367152054992913/1002631777560576040/1003022444409405560

I've also tried stopping and starting the service, rebooting the machine, deleting data folder and resyncing, none of it helped.

I am running on EC2 in Amazon Linux 2, the machine meets minimum requirements.

Please let me know if I can provide more information.

DDeAlmeida commented 2 years ago

Can you try to increase the power of the server

IaroslavTitov commented 2 years ago

What do you mean by power, @DDeAlmeida?

I checked my CPU usage, 53% is idle:

12:04:26 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
12:04:26 PM  all   43.61    0.00    2.20    0.07    0.00    0.58    0.00    0.00    0.00   53.55

There's also plenty of memory avaliable:

[ec2-user@ip-172-31-18-80 ~]$ free -m
              total        used        free      shared  buff/cache   available
Mem:           7643        4377        1021           0        2244        3091

Still have 236GB of disk space.

The machine is EC2 a1.xlarge - 4 CPUs, 8 GB of RAM

What else can I check/increase? I'm always getting stuck at 99%, not any other point in the sync.

DDeAlmeida commented 2 years ago

4CPU is not enough from my pov

IaroslavTitov commented 2 years ago

Well, I got the specs recommended in the challenge... Maybe I'll try a better machine later.

However, I doubt that is the issue, since a lot of participants seem to be running on minimum requirement machines, some just run on home PCs.

EdwardsVO commented 2 years ago

Try to upgrade your node to 8CPU and 16RAM, imo you are having a performance issue

IaroslavTitov commented 2 years ago

Retried with analogous server, but on ubuntu - same result. I will try a stronger machine next.

image

stiavnik commented 2 years ago

I am having the same, once I lost sync while trying something out I can never get back, deleting data, neard init, config.json, different sw commits... nothing helps. And I am not running on Intel 386sx.

IaroslavTitov commented 2 years ago

Mine worked after restarting on a different machine with 8 CPUs and 16GB RAM + made sure CPU had AVX support. Still having issues producing chunks, but at least sync seems to have succeeded.