hashgraph / hedera-services

Crypto, token, consensus, file, and smart contract services for the Hedera public ledger
Apache License 2.0
281 stars 124 forks source link

Rehash for `172129258` round has failed #13531

Closed imalygin closed 3 months ago

imalygin commented 3 months ago

Description

Rehash validation failed for 172129258 round.

Here is an example: https://github.com/hashgraph/hedera-state-validator/actions/runs/9236431995/job/25412277720

We need to figure out why exactly it has failed

Steps to reproduce

  1. Download 172129258 round for any node.
  2. Download validator-0.49.jar
  3. Run java -jar validator-0.49.jar /path/to/round/171851985 rehash

Additional context

Validation of the next backup passed successfully

https://github.com/hashgraph/hedera-state-validator/actions/runs/9236565102/job/25412557889

Hedera network

mainnet

Version

v0.49

Operating system

None

imalygin commented 3 months ago

One more round failed similarly - 172327696

Here is an example -https://github.com/hashgraph/hedera-state-validator/actions/runs/9258363503/job/25468331276

imalygin commented 3 months ago

Another attempt of running the validation failed for the same round - https://github.com/hashgraph/hedera-state-validator/actions/runs/9271340422/job/25506576839

So, it's not the artifact of incomplete download.

imalygin commented 3 months ago

Here is the hash info diff that is in produced by validation of 172327696 round (see here https://github.com/hashgraph/hedera-state-validator/actions/runs/9276595809/job/25524033185):

{deltas=[
 [ChangeDelta, position: 0, lines: 
    [(root) State /rubber-connect-slush-blood,0 MerkleHederaState /0 reduce-volcano-awkward-remain] to 
    [(root) State /tennis-exhibit-much-flee,0 MerkleHederaState /0 alley-wheel-paddle-rice]],

[ChangeDelta, position: 71, lines: 
   [23 VirtualMap ScheduleService.SCHEDULES_BY_EQUALITY/0/23consider-staff-shoot-laundry] to 
   [23 VirtualMap ScheduleService.SCHEDULES_BY_EQUALITY/0/23attack-quarter-slam-canal]],

 [ChangeDelta, position: 73, lines:
   [ 1 VirtualRootNode /0/23/1brown-horn-nuclear-give, 24 VirtualMap ScheduleService.SCHEDULES_BY_EXPIRY_SEC/0/24save-winter-blossom-account] to
   [ 1 VirtualRootNode /0/23/1service-spy-combine-potato, 24 VirtualMap ScheduleService.SCHEDULES_BY_EXPIRY_SEC/0/24faint-proud-chalk-twist]
 ],
 [ChangeDelta, position: 76, lines:
   [ 1 VirtualRootNode /0/24/1swift-phone-nose-tell, 25 VirtualMap ScheduleService.SCHEDULES_BY_ID/0/25noble-raw-theory-mail] to 
   [ 1 VirtualRootNode /0/24/1extra-drill-trust-quick, 25 VirtualMap ScheduleService.SCHEDULES_BY_ID/0/25divert-huge-erase-route]],
 [ChangeDelta, position: 79, lines: 
[ 1 VirtualRootNode /0/25/1sting-camera-ethics-icon] to 
[ 1 VirtualRootNode /0/25/1eager-cotton-ribbon-prize]]
]}
artemananiev commented 3 months ago

I was able to reproduce this issue using a slightly modified VirtualMap benchmark. After endless runs, the root cause was found:

So the problem is not with serialization/deserialization, but with hashing.

artemananiev commented 3 months ago

The fix is straightforward. When leaf 2 is removed, explicitly mark leaf 1 as dirty, despite it hasn't actually changed (neither leaf data, nor leaf path). Now I'm trying to understand if a similar problem may be observed, when the last leaf is removed