mirage / irmin

Irmin is a distributed database that follows the same design principles as Git
https://irmin.org
ISC License
1.84k stars 153 forks source link

irmin-pack: assert failure, pack_index.ml, line 108 #1987

Open tomjridge opened 2 years ago

tomjridge commented 2 years ago

From tezos-dev, devteam channel: https://tezos-dev.slack.com/archives/GB0UR34N8/p1658154805411149

$ dune exec scripts/yes-wallet/yes_wallet.exe -- create from context ~/tezos/tezos-node-ghostnet/ in /tmp/yes-wallet --active-bakers-only --staking-share 75
Fatal error: exception File "src/irmin-pack/unix/pack_index.ml", line 108, characters 26-32: Assertion failed
Raised at Irmin_pack_unix__Pack_index.Make.flush in file "src/irmin-pack/unix/pack_index.ml", line 108, characters 26-38
Called from Irmin_pack_unix__File_manager.Make.flush_index_and_its_deps in file "src/irmin-pack/unix/file_manager.ml", line 171, characters 14-57
Called from Irmin_pack_unix__File_manager.Make.flush in file "src/irmin-pack/unix/file_manager.ml" (inlined), line 193, characters 16-42
Called from Irmin_pack_unix__Ext.Maker.Make.X.Repo.batch.on_success in file "src/irmin-pack/unix/ext.ml", line 152, characters 12-35
Called from Tezos_context_disk__Context.Make.raw_commit in file "src/lib_context/disk/context.ml", line 377, characters 14-31
Called from Tezos_context_disk__Context.Make.commit_genesis in file "src/lib_context/disk/context.ml", line 754, characters 19-59
Called from Tezos_store_unix__Store.init in file "src/lib_store/unix/store.ml", line 2675, characters 27-51
Called from Yes_wallet_lib.load_mainnet_bakers_public_keys in file "scripts/yes-wallet/yes_wallet_lib.ml", line 660, characters 4-206
Called from Yes_wallet_lib.load_mainnet_bakers_public_keys in file "scripts/yes-wallet/yes_wallet_lib.ml", line 718, characters 6-112
Called from Dune__exe__Yes_wallet in file "scripts/yes-wallet/yes_wallet.ml", line 223, characters 8-130
tomjridge commented 2 years ago

Some initial tracing through the code...

scripts/yes-wallet/yes_wallet_lib.ml", line 660 (surrounding code):

let* store =
    Tezos_store.Store.init
      ~store_dir:(Filename.concat base_dir "store")
      ~context_dir:(Filename.concat base_dir "context")
      ~allow_testchains:true
      ~readonly:true
      mainnet_genesis
  in

Note the use of "~readonly:true".

"src/lib_store/unix/store.ml", line 2675, characters 27-51 (surrounding code):

let init ... = 
  ...
  let*! context_index, commit_genesis =
    match commit_genesis with
    | Some commit_genesis ->
        let*! context_index =
          Context.init ~readonly:true ?patch_context context_dir
        in
        Lwt.return (context_index, commit_genesis)
    | None ->
        let*! context_index =
          Context.init ~readonly ?patch_context context_dir
        in
        let commit_genesis ~chain_id =
          Context.commit_genesis
            context_index
            ~chain_id
            ~time:genesis.time
            ~protocol:genesis.protocol
        in
        Lwt.return (context_index, commit_genesis)
  in
...
    (* Fresh store *)
    let* genesis_context = commit_genesis ~chain_id in

Note how in the "None" case of the "match commit_genesis", readonly is passed to Context.init... but later we will call "commit_genesis ~chain_id" which probably requires the index to be writable.

So my current guess is that somehow there is a mismatch with "~readonly:true" from yes_wallet_lib.ml and the need to modify the index to add the genesis commit.

tomjridge commented 2 years ago

Worth mentioning that purely within file_manager, a flush is called on a readonly index, and this causes the initial exception. So even locally there seems to be something wrong (file_manager should not call flush on a readonly index presumably).

metanivek commented 2 years ago

As of https://github.com/mirage/irmin/pull/2044 (see https://gitlab.com/tezos/tezos/-/issues/3520 for originating context on that PR) this will now throw a RO_not_allowed exception but I'm still curious about the underlying issue of a readonly tezos context creating commits.

I'll confess I don't understand the motivation of the design of the batch API, but it casts readonly stores to readwrite without consideration for if the repo was originally opened as readonly.

Is there a legitimate use case for what Tezos is doing? What is the proper behavior for the batch api in this situation (ignoring readonly seems wrong to me)? @Ngoguey42 @samoht any historical context that would be useful here?

Ngoguey42 commented 2 years ago

In https://github.com/mirage/irmin/pull/1690 I planned on raising an exception in batch in case of RO instance. I forgot to add it back when implementing the new IO

samoht commented 2 years ago

The batch API was initially meant to be transactional: either the batch succeeds, and everything is flushed/written on disk, or it fails, and nobody notices. This is not entirely true currently, as there are auto-flush events where some objects are flushed to disk, so if the batch is aborted, some garbage will remain. With the GC, this garbage will be reclaimed at one point, so we are good.

But It's not a good idea to let the RO instance flushes data on disk (or even call batch). This should never happen and is a programmer error.

metanivek commented 2 years ago

@Ngoguey42 :+1:

@samoht thanks for the added context. Seems there is programmer error in our code (that we will fix) but is there also error in the tezos code that does not properly handle readonly contexts?