hashgraph / hedera-services

Crypto, token, consensus, file, and smart contract services for the Hedera public ledger
Apache License 2.0
281 stars 125 forks source link

Test Crypto-Restart-Stake-2.5k-10m failed due to did not turn ACTIVE within 300 seconds #9111

Closed JeffreyDallas closed 8 months ago

JeffreyDallas commented 11 months ago

Description

https://swirldslabs.slack.com/archives/C03E8SA5UF9/p1696750957782539 https://swirldslabs.slack.com/archives/C03E8SA5UF9/p1696753151193759 https://swirldslabs.slack.com/archives/C03E8SA5UF9/p1696750957782539

Steps to reproduce

Nightly regression run Crypto-Restart-Stake-2.5k-10m

Additional context

No response

Hedera network

other

Version

latest

Operating system

None

JeffreyDallas commented 11 months ago

Failed test

https://swirldslabs.slack.com/archives/C03E8SA5UF9/p1696839931118799

The newly added node0004 stuck at BEHIND, could not enter ACTIVE mode

Previously passed tests

https://swirldslabs.slack.com/archives/C03E8SA5UF9/p1696580841678729

JeffreyDallas commented 11 months ago

Node0004 sent reconnect request to other nodes, but got denied due to

Rejecting reconnect request from node 4 because this node isn't ACTIVE

litt3 commented 11 months ago

This doesn't appear to be a platform failure to me. Rather, a test configuration problem

The reconnect request is rejected because the 4 original nodes have an ISS while replaying PCES

JeffreyDallas commented 11 months ago

Thanks for the update. But I don' think it's test configuration issue. The test passed last friday and now failed without any changes in regression test configuration.

Please assign someone from platform team to investigate the new ISS failure.

JeffreyDallas commented 11 months ago

Freeze issue resovled

But nonr-restart node still reported invalid signature issue.

http://35.247.76.217:8095/swirlds-automation/develop/5N/UnevenStake/20231014-074034-GCP-RestartWithNewNodes-UnevenStake-5N/Crypto-Restart-Stake-2.5k-10m/

Node0004 log:

2023-10-14 07:54:15.739 486      FATAL EXCEPTION        <<platform-core: thread-cons 2>> ConsensusHashManager: Invalid State Signature (ISS): this node has the wrong hash for round 326.

Commit #1b3273d2f7 failed

https://swirldslabs.slack.com/archives/C03G7CBJJ06/p1697349568568919

http://35.247.76.217:8095/JeffreyDallas/09111-D-Crypto-Restart-Stake-2.5k-10m-4/5N/SimilarStake/20231015-053537-GCP-RestartWithNewNodes-SimilarStake-5N/Crypto-Restart-Stake-2.5k-10m/

Commit #cab58850c0 is passed

https://swirldslabs.slack.com/archives/C03G7CBJJ06/p1697350492472209

http://35.247.76.217:8095/JeffreyDallas/09111-D-Crypto-Restart-Stake-2.5k-10m-5/5N/SimilarStake/20231015-050441-GCP-RestartWithNewNodes-SimilarStake-5N/Crypto-Restart-Stake-2.5k-10m/

litt3 commented 8 months ago

Stale