hashgraph / hedera-services

Crypto, token, consensus, file, and smart contract services for the Hedera public ledger
Apache License 2.0
299 stars 137 forks source link

MixedOps-Global-ND-3NReconnect-6k-30m : Reconnecting Nodes never go `BEHIND` #1717

Closed anighanta closed 3 years ago

anighanta commented 3 years ago

Summary of the defect The reconnecting nodes won't go behind after joining back.

How to reproduce (if possible) Run test MixedOps-Global-ND-3NReconnect-6k-30m

Service logs (if applicable)

2021-06-30 08:23:15.665 44       INFO  STATE_TO_DISK    <<platform-core: objectFileManager 12 #0>> SignedStateFileManager: 'Signed state for round 3657' waiting for snapshotTask
2021-06-30 08:23:15.666 45       INFO  STATE_TO_DISK    <<platform-core: objectFileManager 12 #0>> SignedStateFileManager: Finished writing 'Signed state for round 3657' to disk
2021-06-30 08:40:51.210 1        DEBUG STARTUP          <main> Browser: main() started {} [com.swirlds.logging.payloads.NodeStartPayload]
2021-06-30 08:40:51.269 2        DEBUG STARTUP          <main> Browser: About to run startPlatforms()
2021-06-30 08:40:51.281 3        DEBUG STARTUP          <main> Browser: About do crypto instantiation
2021-06-30 08:40:51.289 4        DEBUG STARTUP          <main> Browser: About start loading keys
2021-06-30 08:40:51.597 5        DEBUG STARTUP          <main> Browser: Done loading keys
2021-06-30 08:40:51.598 6        DEBUG STARTUP          <main> Browser: About to start creating Crypto objects
2021-06-30 08:40:53.867 7        DEBUG STARTUP          <main> Browser: Done creating Crypto objects
2021-06-30 08:40:53.867 8        DEBUG STARTUP          <main> Browser: Done with crypto instantiation
2021-06-30 08:40:54.042 9        DEBUG STARTUP          <main> Browser: Starting platforms
2021-06-30 08:40:54.049 10       DEBUG STARTUP          <main> Browser: Scanning the classpath for RuntimeConstructable classes
2021-06-30 08:40:55.276 11       DEBUG STARTUP          <main> Browser: Done with registerConstructables, time taken 1225ms
2021-06-30 08:41:05.586 12       DEBUG RECONNECT        <main> FCMap: FCMap Initialized [ internalMapSize = 610734, treeSize = 610734 ]
2021-06-30 08:41:05.602 13       DEBUG RECONNECT        <main> FCMap: FCMap Initialized [ internalMapSize = 1715, treeSize = 1715 ]
2021-06-30 08:41:07.651 14       DEBUG RECONNECT        <main> FCMap: FCMap Initialized [ internalMapSize = 614703, treeSize = 614703 ]
2021-06-30 08:41:07.664 15       DEBUG RECONNECT        <main> FCMap: FCMap Initialized [ internalMapSize = 2330, treeSize = 2330 ]
2021-06-30 08:41:10.171 16       DEBUG RECONNECT        <main> FCMap: FCMap Initialized [ internalMapSize = 955557, treeSize = 955557 ]
2021-06-30 08:41:10.174 17       DEBUG RECONNECT        <main> FCMap: FCMap Initialized [ internalMapSize = 0, treeSize = 0 ]
2021-06-30 08:41:18.174 18       DEBUG SNAPSHOT_MANAGER <main> SnapshotManager: SnapshotManager: Successfully queued snapshot request [taskType='RESTORE', applicationName='com.hedera.services.ServicesMain', worldId='123', nodeId=12, roundNumber=3657, snapshotId=restore, timeStarted=2021-06-30T08:41:18.164680Z, timeCompleted=null, complete=false, error=false ]
2021-06-30 08:41:18.356 19       DEBUG SNAPSHOT_MANAGER << SnapshotManager >> SnapshotManager: SnapshotManager: Process successfully executed for snapshot task [taskType='RESTORE', applicationName='com.hedera.services.ServicesMain', worldId='123', nodeId=12, roundNumber=3657, snapshotId=restore, timeStarted=2021-06-30T08:41:18.164680Z, timeCompleted=null, complete=false, error=false ] 
2021-06-30 08:41:18.357 20       INFO  SNAPSHOT_MANAGER << SnapshotManager >> SnapshotManager: SnapshotManager: Completed task [taskType='RESTORE', applicationName='com.hedera.services.ServicesMain', worldId='123', nodeId=12, roundNumber=3657, snapshotId=restore, timeStarted=2021-06-30T08:41:18.164680Z, timeCompleted=2021-06-30T08:41:18.357292Z, complete=true, error=false ]
2021-06-30 08:41:18.357 21       INFO  STATE_TO_DISK    <main> SignedStateFileManager: Deleted: '/home/services_nightly_regression/remoteExperiment/data/saved/com.hedera.services.ServicesMain/12/123/3656/settingsUsed.txt'
2021-06-30 08:41:18.411 22       INFO  STATE_TO_DISK    <main> SignedStateFileManager: Deleted: '/home/services_nightly_regression/remoteExperiment/data/saved/com.hedera.services.ServicesMain/12/123/3656/SignedState.swh'
2021-06-30 08:41:18.412 23       INFO  STATE_TO_DISK    <main> SignedStateFileManager: Deleted: '/home/services_nightly_regression/remoteExperiment/data/saved/com.hedera.services.ServicesMain/12/123/3656/PostgresBackup.tar.gz'
2021-06-30 08:41:18.412 24       INFO  STATE_TO_DISK    <main> SignedStateFileManager: Deleted: '/home/services_nightly_regression/remoteExperiment/data/saved/com.hedera.services.ServicesMain/12/123/3656'
2021-06-30 08:41:18.414 25       DEBUG STARTUP          <main> BinaryObjectStore: Recovery Starting
2021-06-30 08:41:18.414 26       DEBUG STARTUP          <main> BinaryObjectStore: Recovering Objects [ objectCount = 1715, uniqueObjectCount = 25 ]
2021-06-30 08:41:18.442 27       DEBUG STARTUP          <main> BinaryObjectStore: Recovery Finished
2021-06-30 08:41:18.443 28       INFO  STARTUP          <main> SwirldsPlatform: startUpEventFrozenEndTime: 2021-06-30T08:41:28.443160Z
2021-06-30 08:41:18.444 29       INFO  STARTUP          <main> SwirldsPlatform: initialize eventStreamManager
2021-06-30 08:41:18.454 30       DEBUG STARTUP          <main> SwirldsPlatform: Platform has loaded a saved state {"round":3657,"consensusTimestamp":{"nano":487192000,"epochSecond":1625041382},"willFreezeUntil":{"nano":443160000,"epochSecond":1625042488}} [com.swirlds.logging.payloads.SavedStateLoadedPayload]
2021-06-30 08:41:18.459 31       INFO  STARTUP          <main> EventFlow:  doConsMinGenFamous after startup [(3646,55658), (3647,55671), (3648,55685), (3649,55699), (3650,55715), (3651,55730), (3652,55747), (3653,55762), (3654,55778), (3655,55792), (3656,55808), (3657,55845)]
2021-06-30 08:41:18.460 32       INFO  STARTUP          <main> EventFlow:  initialHash after startup ecaeff83bbcf77218f1dfbcd01dc825baadc08f17e7661196f34037f95631f32381558c899221005a5343cd8daca6e7c
2021-06-30 08:41:18.484 33       DEBUG STARTUP          <main> ConsensusImpl: ConsensusImpl is initialized from signed state. minRound: 3646(min gen = 55658), maxRound: 3657(max gen = 55845)
2021-06-30 08:41:18.487 34       INFO  STARTUP          <main> SwirldsPlatform: Last known sequence numbers after restart are [40362, 39739, 40091, 40604, 40531, 38722, 39942, 31840, 40523, 40563, 40288, 40300, 38717, 40447, 40290]
2021-06-30 08:41:18.494 35       INFO  STARTUP          <main> EventFlow: EventFlow.startAll(), forCons.size: 0, forCurr.size: 0, forNext.size: null, forSigs.size: 1664, forWork.size: null
2021-06-30 08:41:20.513 36       DEBUG STARTUP          <main> Browser: jvmPauseDetectorThread started
2021-06-30 08:41:20.513 37       DEBUG STARTUP          <main> Browser: Done with starting platforms
2021-06-30 08:41:20.531 38       DEBUG STARTUP          <main> Browser: main() finished
2021-06-30 08:41:20.543 39       INFO  STARTUP          <<platform-core: CsvWriter 12 #0>> CsvWriter: CsvWriter: Initializing statistics output in CSV format [ writePeriod = '3000', csvOutputFolder = '/home/services_nightly_regression/remoteExperiment', csvFileName = 'ServicesStats12.csv' ]
2021-06-30 08:41:20.544 40       DEBUG STARTUP          <<platform-core: CsvWriter 12 #0>> CsvWriter: CsvWriter: Using the existing stats folder [ folder = '/home/services_nightly_regression/remoteExperiment' ]
2021-06-30 08:41:20.630 41       INFO  PLATFORM_STATUS  <sync_server_2> SwirldsPlatform: Platform status changed. {"oldStatus":"","newStatus":"ACTIVE"} [com.swirlds.logging.payloads.PlatformStatusPayload]
2021-06-30 08:41:20.630 42       INFO  PLATFORM_STATUS  <sync_server_2> SwirldsPlatform: Platform status changed to: ACTIVE
2021-06-30 08:41:28.067 43       INFO  STATE_TO_DISK    <<platform-core: objectFileManager 12 #0>> SignedStateFileManager: Started writing 'Signed state for round 3728' to disk
2021-06-30 08:41:28.069 44       DEBUG SNAPSHOT_MANAGER <<platform-core: objectFileManager 12 #0>> SnapshotManager: SnapshotManager: Successfully queued snapshot request [taskType='BACKUP', applicationName='com.hedera.services.ServicesMain', worldId='123', nodeId=12, roundNumber=3728, snapshotId=00000003-000001FE-1, timeStarted=2021-06-30T08:41:28.068281Z, timeCompleted=null, complete=false, error=false ]
2021-06-30 08:41:28.259 45       DEBUG SNAPSHOT_MANAGER << SnapshotManager >> SnapshotManager: SnapshotManager: Process successfully executed for snapshot task [taskType='BACKUP', applicationName='com.hedera.services.ServicesMain', worldId='123', nodeId=12, roundNumber=3728, snapshotId=00000003-000001FE-1, timeStarted=2021-06-30T08:41:28.068281Z, timeCompleted=null, complete=false, error=false ] 
2021-06-30 08:41:28.260 46       INFO  SNAPSHOT_MANAGER << SnapshotManager >> SnapshotManager: SnapshotManager: Completed task [taskType='BACKUP', applicationName='com.hedera.services.ServicesMain', worldId='123', nodeId=12, roundNumber=3728, snapshotId=00000003-000001FE-1, timeStarted=2021-06-30T08:41:28.068281Z, timeCompleted=2021-06-30T08:41:28.259741Z, complete=true, error=false ]
2021-06-30 08:41:43.165 47       INFO  STATE_TO_DISK    <<platform-core: objectFileManager 12 #0>> SignedStateFileManager: Done writing saved state with HashEventsCons 7b1b3313e6561469d2d2270543a29ab750b23131475728cc78d7c2a8754d28aab965ef42dcfb5f88c7e36a4773eb1278, starting with local events
2021-06-30 08:41:43.167 48       INFO  STATE_TO_DISK    <<platform-core: objectFileManager 12 #0>> SignedStateFileManager: 'Signed state for round 3728' waiting for snapshotTask
2021-06-30 08:41:43.168 49       INFO  STATE_TO_DISK    <<platform-core: objectFileManager 12 #0>> SignedStateFileManager: Finished writing 'Signed state for round 3728' to disk
2021-06-30 08:41:58.380 50       INFO  STATE_TO_DISK    <<platform-core: objectFileManager 12 #0>> SignedStateFileManager: Started writing 'Signed state for round 3823' to disk

Possible Resolution decreasing "state.roundsExpired" from 500 might help

Neeharika-Sompalli commented 3 years ago

This issue is fixed and the passing result https://hedera-hashgraph.slack.com/archives/CKWHL8R9A/p1625475393462700