Open tomjridge opened 2 years ago
Some initial tracing through the code...
scripts/yes-wallet/yes_wallet_lib.ml", line 660 (surrounding code):
let* store =
Tezos_store.Store.init
~store_dir:(Filename.concat base_dir "store")
~context_dir:(Filename.concat base_dir "context")
~allow_testchains:true
~readonly:true
mainnet_genesis
in
Note the use of "~readonly:true".
"src/lib_store/unix/store.ml", line 2675, characters 27-51 (surrounding code):
let init ... =
...
let*! context_index, commit_genesis =
match commit_genesis with
| Some commit_genesis ->
let*! context_index =
Context.init ~readonly:true ?patch_context context_dir
in
Lwt.return (context_index, commit_genesis)
| None ->
let*! context_index =
Context.init ~readonly ?patch_context context_dir
in
let commit_genesis ~chain_id =
Context.commit_genesis
context_index
~chain_id
~time:genesis.time
~protocol:genesis.protocol
in
Lwt.return (context_index, commit_genesis)
in
...
(* Fresh store *)
let* genesis_context = commit_genesis ~chain_id in
Note how in the "None" case of the "match commit_genesis", readonly is passed to Context.init... but later we will call "commit_genesis ~chain_id" which probably requires the index to be writable.
So my current guess is that somehow there is a mismatch with "~readonly:true" from yes_wallet_lib.ml and the need to modify the index to add the genesis commit.
Worth mentioning that purely within file_manager, a flush is called on a readonly index, and this causes the initial exception. So even locally there seems to be something wrong (file_manager should not call flush on a readonly index presumably).
As of https://github.com/mirage/irmin/pull/2044 (see https://gitlab.com/tezos/tezos/-/issues/3520 for originating context on that PR) this will now throw a RO_not_allowed
exception but I'm still curious about the underlying issue of a readonly tezos context creating commits.
I'll confess I don't understand the motivation of the design of the batch
API, but it casts readonly stores to readwrite without consideration for if the repo was originally opened as readonly.
Is there a legitimate use case for what Tezos is doing? What is the proper behavior for the batch api in this situation (ignoring readonly seems wrong to me)? @Ngoguey42 @samoht any historical context that would be useful here?
In https://github.com/mirage/irmin/pull/1690 I planned on raising an exception in batch
in case of RO instance. I forgot to add it back when implementing the new IO
The batch API was initially meant to be transactional: either the batch succeeds, and everything is flushed/written on disk, or it fails, and nobody notices. This is not entirely true currently, as there are auto-flush events where some objects are flushed to disk, so if the batch is aborted, some garbage will remain. With the GC, this garbage will be reclaimed at one point, so we are good.
But It's not a good idea to let the RO instance flushes data on disk (or even call batch
). This should never happen and is a programmer error.
@Ngoguey42 :+1:
@samoht thanks for the added context. Seems there is programmer error in our code (that we will fix) but is there also error in the tezos code that does not properly handle readonly contexts?
From tezos-dev, devteam channel: https://tezos-dev.slack.com/archives/GB0UR34N8/p1658154805411149