mirage / irmin

Irmin is a distributed database that follows the same design principles as Git
https://irmin.org
ISC License
1.85k stars 157 forks source link

Corrupted store while bootstrapping a tezos node with hangzhounet #1829

Closed icristescu closed 2 years ago

icristescu commented 2 years ago

A corrupted store was obtained while boostrapping with hangzhounet (available on comanche at /bench/ioana/corrupt_hnet/hangzhounet-backup.tgz) and it fails when validating block 643736 (https://hangzhounet.tzkt.io/BM8ZkiT9TVe7XZaEqUkJWZqCjT96TQSEzBYfgqsPug3xnX33eSt):

node_1                   | 2022-03-15T14:25:21.901850755Z Mar 15 14:25:21.902 - validator.block: Inconsistent hash:
node_1                   | 2022-03-15T14:25:21.901852606Z Mar 15 14:25:21.902 - validator.block:   got: CoUviMZNKiSiqH8kyKtPqCXF217ckVScGdbvovMhy4guFWfThRUk
node_1                   | 2022-03-15T14:25:21.901854390Z Mar 15 14:25:21.902 - validator.block:   expected: CoWGVAvBtcjkqzLrhT1XFiQ1uq6scD1396mv9CiBG1wD1ETzrXRg

I reproduced the error twice, the procedure seems to be: 1/ launch a bootstrap with hanghzhounet and the patch below to stop the validation at block 643735 2/ stop the node when it stops validating blocks 3/ modify the patch to validate two more blocks 4/ relaunch the node - it will validate one block successfully (the 643735) and then raise the inconsistent hash error for block 643736.

Debug logs for step 4 are available at /bench/ioana/corrupt_hnet/logs_after_35. A “correct” stores containing 643736 and not corrupted is available at /bench/ioana/corrupt_hnet/correct_minimal.tar.gz.

The patch to stop the validation at level 42:

diff --git a/src/lib_validation/block_validation.ml b/src/lib_validation/block_validation.ml
index 15ad14d60e..4af7c84738 100644
--- a/src/lib_validation/block_validation.ml
+++ b/src/lib_validation/block_validation.ml
@@ -377,6 +377,10 @@ module Make (Proto : Registered_protocol.T) = struct
       ~predecessor_context ~(block_header : Block_header.t) operations =
     let open Lwt_tzresult_syntax in
     let block_hash = Block_header.hash block_header in
+    let level_to_stop = 42l in
+    if block_header.shell.level = level_to_stop then (
+      Format.printf "Stopping the node at level %ld@." level_to_stop ;
+      assert false) ;
     match cached_result with
     | Some (({result; _} as cached_result), context)
       when Context_hash.equal
metanivek commented 2 years ago

@icristescu I'm guessing this can be closed (?) Re-open if not!