marigold-dev / deku

MIT License
80 stars 16 forks source link

State.bin corruption #591

Open d4hines opened 2 years ago

d4hines commented 2 years ago

While debugging some stuff I'm hitting an error like this while trying to start the node

deku-node: internal error, uncaught exception:
           End_of_file
           Raised at Lwt.Miscellaneous.poll in file "src/core/lwt.ml", line 3095, characters 20-29
           Called from Lwt_main.run.run_loop in file "src/unix/lwt_main.ml", line 31, characters 10-20
           Called from Lwt_main.run in file "src/unix/lwt_main.ml", line 60, characters 2-13
           Re-raised at Lwt_main.run in file "src/unix/lwt_main.ml", line 124, characters 4-13
           Called from Dune__exe__Deku_node.node in file "src/bin/deku_node.ml", line 141, characters 13-65
           Called from Cmdliner_term.app.(fun) in file "cmdliner_term.ml", line 25, characters 19-24
           Called from Cmdliner.Term.run in file "cmdliner.ml", line 117, characters 32-39

Hypothesis: when killing the node, state.bin is corrupted.

aguillon commented 2 years ago

How can I reproduce this error?

d4hines commented 2 years ago

Our hypothesis is that it will happen if you kill the node while it's writing state.bin

aguillon commented 2 years ago

It should be easy to fix: write the state in a temporary file, then renaming it (which is atomic according to this man page) to state.bin and deleting the old path. However it's taking me a bit of time to reproduce the bug itself, which could still be valuable in the long run.

I'll keep you posted.