hashgraph / hedera-services

Crypto, token, consensus, file, and smart contract services for the Hedera public ledger
Apache License 2.0
281 stars 125 forks source link

JRS Test release .53 state written to disk without any signatures (infrequent) #15229

Closed edward-swirldslabs closed 1 week ago

edward-swirldslabs commented 2 weeks ago

Description

JRS Test Results: http://35.247.76.217:8095/swirlds-automation/release/0.53/4N_2C/Ubuntu2204_Update/20240825-111009-GCP-Daily-Update-Ubuntu2204-4N-2C/Crypto-Update-Jar-1.5k-25m/

Every node wrote a state to disk on round 4217 at 46.6 seconds with 0 of 501 stake accounted for. The weight distribution on nodes is as follows:

--- Used Address Book ---
address, 0, 0, node1, 499, 10.128.0.139, 30124, 34.173.157.41, 30124, 0.0.3
address, 1, 1, node2, 1,   10.128.0.140, 30125, 34.41.248.35,  30125, 0.0.4
address, 2, 2, node3, 1,   10.128.0.137, 30126, 34.133.215.32, 30126, 0.0.5
address, 3, 3, node4, 0,   10.128.0.135, 30127, 146.148.74.3,  30127, 0.0.6

It could be the case there were no signatures on the state, or 1 signature from node id 3.

Given how rare this event is, it regularly happens that node 0 provides a signature to the state. In this case node 0, 1, and 2 failed to have signatures present.

signing weight is initialized to 0 here:

PCES Replay created the snapshot. Either we did not create the signatures after upgrade, or the signatures were never produced prior to upgrade. Or they were dropped.

Steps to reproduce

Follow the JRS test and pray to RNGesus it happens with a debugger attached.

Additional context

Debug Address Book Before Update:

Debug Address Book After Update:

Hedera network

other

Version

v0.53.x

Operating system

Linux

edward-swirldslabs commented 2 weeks ago

PCES snapshots are skipped for signature generation.

Perhaps we can add a skip to validation under the same condition.

        if (reservedState.isComplete() || reservedState.isPcesRound()) {
            // when state is complete or signature generation has been skipped for PCES generated state snapshots, nothing to do
            return;
        }
litt3 commented 2 weeks ago

PCES snapshots are skipped for signature generation.

Perhaps we can add a skip to validation under the same condition.

        if (reservedState.isComplete() || reservedState.isPcesRound()) {
            // when state is complete or signature generation has been skipped for PCES round states, nothing to do
            return;
        }

I think this is a fine solution, to get rid of the nuisance log. No problems come to mind, in any case.

It doesn't fix the problem of having an unsigned snapshot on disk (though I'm not sure how much of a problem that even is...). But at least we can stop spending time debugging these tests