lucaong / cubdb

Elixir embedded key/value database
Apache License 2.0
556 stars 23 forks source link

Bug: Error on CubDB startup #71

Open TomHoenderdos opened 1 year ago

TomHoenderdos commented 1 year ago

Suddenly I saw CubDB had crashed this morning, this was the error returned to me. After I cleaned the cubdb files in the data_dir this error was gone. I wasn't able to download the invalid file, so I'm not giving you much to work with I'm afraid. If this happens again I will try to fetch the invalid db file and post it here.

  {:error,
  {%ArgumentError{
    message: "errors were found at the given arguments:\n\n  * 1st argument: invalid external representation of a term\n"
  },
  [
    {CubDB.Store.CubDB.Store.File, :raise_if_error, 1,
     [
       file: 'lib/cubdb/store/file.ex',
       line: 156,
       error_info: %{module: Exception}
     ]},
    {CubDB.Btree, :new, 2, [file: 'lib/cubdb/btree.ex', line: 64]},
    {CubDB, :init, 1, [file: 'lib/cubdb.ex', line: 1232]},
    {:gen_server, :init_it, 2, [file: 'gen_server.erl', line: 851]},
    {:gen_server, :init_it, 6, [file: 'gen_server.erl', line: 814]},
    {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 240]}
  ]}}
  ** (EXIT from #PID<0.4989.0>) shell process exited with reason: an exception was raised:
    ** (ArgumentError) errors were found at the given arguments:

  * 1st argument: invalid external representation of a term

        (cubdb 2.0.2) lib/cubdb/store/file.ex:156: CubDB.Store.CubDB.Store.File.raise_if_error/1
        (cubdb 2.0.2) lib/cubdb/btree.ex:64: CubDB.Btree.new/2
        (cubdb 2.0.2) lib/cubdb.ex:1232: CubDB.init/1
        (stdlib 4.1) gen_server.erl:851: :gen_server.init_it/2
        (stdlib 4.1) gen_server.erl:814: :gen_server.init_it/6
        (stdlib 4.1) proc_lib.erl:240: :proc_lib.init_p_do_apply/3
lucaong commented 1 year ago

It looks like CubDB successfully locates the latest good header, but then when it tries to read the root node that the header points to, it finds it's not a valid serialized representation.

This should not be caused by a crash: CubDB is append-only, and the transaction header is written last, so once a sane header is found, everything before it (including the root and the whole btree) must have been successfully written and committed. At a glance, it looks like file corruption of some sort. If you find a file that you can share with me (either here or privately) I will definitely investigate further. Meanwhile, if there is any reason to believe that the file system got corrupted, that could be it.

Regarding possible other causes, I assume that you did not downgrade OTP or erlang/Elixir to an earlier version, so I'd exclude a compatibility issue of the Erlang external format (also, I think it's pretty stable across releases).

If you have any other hint, do let me know. Thank you for reporting this!