Closed fish2plain closed 1 year ago
on a different shard0 node, gotten similar error but stack trace is on different line.
But I won't be trying test binary on this node. I ran the test binary on another node, and it fell behind ~10K blocks after restart.
Started WS server at: 127.0.0.1:9800
Started Auth-WS server at: 127.0.0.1:9801
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xd1ad02]
goroutine 3724389 [running]:
github.com/harmony-one/harmony/core/types.(*Block).NumberU64(0x0, 0xc081bba000)
/home/runner/work/harmony/harmony/harmony/core/types/block.go:482 +0x22
github.com/harmony-one/harmony/consensus.(*Consensus).onPrepared.func1(0xc000fae000, 0x0)
/home/runner/work/harmony/harmony/harmony/consensus/validator.go:273 +0x54
created by github.com/harmony-one/harmony/consensus.(*Consensus).onPrepared
/home/runner/work/harmony/harmony/harmony/consensus/validator.go:270 +0x591
Same as @fish2plain Ran the test binary for a few hours and it slowed to a crawl with the OUT OF SYNC messages and fell significantly behind.
Reverting to version 4.3.0 seems to have fixed the issue for now.
@gsampathkumar could you confirm the testnet binary version you tried ? The latest binary has now another commit that is helping with the sync speed. And just to confirm were you still experiencing the panic issue while using the testnet binary ?
@sophoah We did not encounter the panic issue using the testnet binary. Only the slow sync.
I will use the latest testnet binary on one of our nodes and test if the slow sync issue gets solved. Will keep this thread posted.
running one node with
root@HarmonySecondary:/mnt/volume_sfo3_03# ./harmony -V Harmony (C) 2020. harmony, version v7214-v4.3.1-3-g4c9546a4 (jenkins@ 2021-12-19T13:56:15+0000)
Its currently caught up though, and not sure if it will exercise the sync path to test if that slow sync issue has been fixed. Let me know if I should let it fall behind for 1-2 hours and then have it try to catch up.
@gsampathkumar no need to force the unsync. I've installed the same code on most of our internal node today, eventually in January, this may become a new release.
same issue
Started RPC server at: 0.0.0.0:62075
Started Auth-RPC server at: 0.0.0.0:9501
Started WS server at: 127.0.0.1:9800
Started Auth-WS server at: 127.0.0.1:9801
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xd1ad02]
goroutine 1901533 [running]:
github.com/harmony-one/harmony/core/types.(*Block).NumberU64(0x0, 0xc01934b750)
/home/runner/work/harmony/harmony/harmony/core/types/block.go:482 +0x22
github.com/harmony-one/harmony/consensus.(*Consensus).onPrepared.func1(0xc000dd2000, 0x0)
/home/runner/work/harmony/harmony/harmony/consensus/validator.go:273 +0x54
created by github.com/harmony-one/harmony/consensus.(*Consensus).onPrepared
/home/runner/work/harmony/harmony/harmony/consensus/validator.go:270 +0x591
./harmony --version Harmony (C) 2020. harmony, version v7211-v4.3.1-0-g65614950 (runner@ 2021-11-27T05:27:53+0000)
@sophoah Met the issue again. Downgrade to 4.3.0 now.
@rlan35 any idea ? seems it happens to some node still, and on validator node, not only explorer node
Hi
This issue happens at epoch change. I have all my nodes went off yesterday at epoch change. Happen while I was sleeping and woke up to a whole bunch of monitoring alerts. Been unelected due to this bug.
Did it again today to all nodes again at epoch change over. Ensuring restart now on service so doesn't unelect me again.
Thanks
I have a node running on Ubuntu 20.04. with default configs for 3 weeks. The hardware exceeds the requirements by a multiple. Nothing else runs on the server. Still the same issue happens to my node roughly once a day too:
Started RPC server at: 127.0.0.1:9500
Started Auth-RPC server at: 127.0.0.1:9501
Started WS server at: 127.0.0.1:9800
Started Auth-WS server at: 127.0.0.1:9801
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xd1ad02]
goroutine 4070621 [running]:
github.com/harmony-one/harmony/core/types.(*Block).NumberU64(0x0, 0xc00e804f60)
/home/runner/work/harmony/harmony/harmony/core/types/block.go:482 +0x22
github.com/harmony-one/harmony/consensus.(*Consensus).onPrepared.func1(0xc00013e500, 0x0)
/home/runner/work/harmony/harmony/harmony/consensus/validator.go:273 +0x54
created by github.com/harmony-one/harmony/consensus.(*Consensus).onPrepared
/home/runner/work/harmony/harmony/harmony/consensus/validator.go:270 +0x591
Downgraded to 4.3.0 for now.
Hi
Just an update
Node kept giving same error at epoch change over.
So switched all my nodes to the testnet version. Since then no more crashes.
Thanks.
I faced the same problem again version v8126-v2023.2.7-0-g1b9614ba
Describe the bug
Harmony node with shard0 crashed few hours after upgraded to v4.3.1.
To Reproduce not reproducible so far
Expected behavior node stable
Screenshots stack trace:
Environment (please complete the following information):
Additional context