near / stakewars-iii

Stake Wars: Episode 3 challenges and place to report issues
87 stars 177 forks source link

Node lagging behind & CPU usage skyrockets (even on 68bfa84ed1455f891032434d37ccad696e91e4f5) #92

Closed Thesephi closed 2 years ago

Thesephi commented 2 years ago

Hi all,

For the past days (after upgrading to commit 78ef2f55857d6118047efccf070ae0f7ddb232ea), my node has been running stably (CPU usage was stably below the optimal load). Yes, I'm aware some of us has issues with that commit and had to roll back to the previous commit, but this node was running okay, so I kept it as-is for the past few days.

However, for the past few hours, the node has started lagging behind: I notice it starts downloading headers & blocks all over again. During this whole "lag-behind" period, CPU usage always surges higher than the optimal load.

Finally, I decided to roll back to the previous commit 68bfa84ed1455f891032434d37ccad696e91e4f5.

I also adjusted the ideal_connections_lo and ideal_connections_hi to 20 & 25 respectively.

However, the situation hasn't got any better. Just now, the CPU usage skyrocketed to a whooping 10x its optimal load:

stakewars-iii-node-cpu-surge

My node was kicked-out as a result (obviously). At the moment, the only thing I could do is to restart the node whenever CPU utilization surges above the optimal load.

Naive observation: whenever the node is downloading headers or blocks, CPU usage surges. So, restarting the app occasionally is definitely not a sustainable approach, since the app can never (or very slowly) completely syncs. This for sure affects the uptime record of my node, so I'll probably not meet Challenge 9's criteria, if this keeps up 🙂

PS: this issue was filed for informational purposes. If anyone is interested in further investigation, and need me to provide anything else (e.g. log records), please let me know.

Thank you for your notice, and all the best! 💪

ivanguardia commented 2 years ago

Mine is using less cpu and mem. Maybe slow cores or poor VPS. While sync, neard uses more cpu and mem.

Thesephi commented 2 years ago

The thing is, it was not always like that for me. It used to behave normally, then some time after upgrading to 78ef2f55857d6118047efccf070ae0f7ddb232ea it started going south.

Anyhow, the past few days, it started working well again. I was informed that some performance test was being done on the network, which could explain why my node behaved variedly.

I'll close the issue for now, for the reason: "not reliably reproduced". Learned some nice insight out of this though 💪