zbox is unable to successfully handle a machine failure

fin-ger commented 5 years ago

As this filesystem aims to provide ACID functionality, I tested if it can handle a full machine failure while a write is in progress. My test shows that zbox fails to even open the repository after such a machine failure.

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: InvalidSuperBlk', libcore/result.rs:1009:5
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
             at libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
   1: std::sys_common::backtrace::print
             at libstd/sys_common/backtrace.rs:71
             at libstd/sys_common/backtrace.rs:59
   2: std::panicking::default_hook::{{closure}}
             at libstd/panicking.rs:211
   3: std::panicking::default_hook
             at libstd/panicking.rs:227
   4: std::panicking::rust_panic_with_hook
             at libstd/panicking.rs:476
   5: std::panicking::continue_panic_fmt
             at libstd/panicking.rs:390
   6: rust_begin_unwind
             at libstd/panicking.rs:325
   7: core::panicking::panic_fmt
             at libcore/panicking.rs:77
   8: core::result::unwrap_failed
             at /home/buildozer/aports/community/rust/src/rustc-1.31.1-src/src/libcore/macros.rs:26
   9: <core::result::Result<T, E>>::unwrap
             at /home/buildozer/aports/community/rust/src/rustc-1.31.1-src/src/libcore/result.rs:808
  10: zbox_fail_test::main
             at src/main.rs:27
  11: std::rt::lang_start::{{closure}}
             at /home/buildozer/aports/community/rust/src/rustc-1.31.1-src/src/libstd/rt.rs:74
  12: std::panicking::try::do_call
             at libstd/rt.rs:59
             at libstd/panicking.rs:310
  13: __rust_maybe_catch_panic
             at libpanic_unwind/lib.rs:102
  14: std::rt::lang_start_internal
             at libstd/panicking.rs:289
             at libstd/panic.rs:392
             at libstd/rt.rs:58
  15: std::rt::lang_start
             at /home/buildozer/aports/community/rust/src/rustc-1.31.1-src/src/libstd/rt.rs:74
  16: main

I tried to make the test as reproducible as possible, so you can recreate it on your own machine. The machine failure is simulated by forcefully shutting down a virtual machine where a zbox program is currently writing on.

Is recovery from a full machine failure not supported yet or am I using the zbox API in wrong way? (main.rs)

burmecia commented 5 years ago

thank you very much @fin-ger , that's an interesting test.

I run your test script and can see it stopped at 7th round but cannot reproduce the error you posted. That crash happened during writing file, not during opening the repo. From your error log, it looks like the error happened when reading super block, which is during open an existing repo. Could you share how you build the executable file on Alpine Linux? I tried built one on Ubuntu but it cannot run on Alpine.

Also, the published zbox v0.6.1 is quite old, the latest code on master branch has been refactored a lot and with many bugs fixed and performance improvement, could you test using the latest code instead? Just use below dependency line in your Cargo.toml:

zbox = { git = "https://github.com/zboxfs/zbox.git", features = ["storage-file"] }

Another tip is you can turn on zbox debug output by setting environment variable in filerun-test.exp:

RUST_LOG=zbox=trace

Looking forward to seeing more result, thanks.

fin-ger commented 5 years ago

The error happens when running zbox-fail-test --file data check in the previously forcefully stopped VM (run-check.exp)

The executable was automatically build by the travis-ci configuration. I am using the official alpine:edge docker container:

docker run --rm -v $(pwd):/volume alpine:edge /bin/sh -c 'cd /volume && apk add rust cargo libsodium-dev && export SODIUM_LIB_DIR=/usr/lib && export SODIUM_STATIC=true && cargo build --target x86_64-alpine-linux-musl'

The executable can be found in ./target/x86_64-alpine-linux-musl/debug/zbox-fail-test.

I will create a new version of my test now which uses the latest master of zbox and the RUST_LOG configuration.

The error is only happening when running the test inside a VM that gets forcefully stopped. The repository is afterwards (booting the VM again) checked against the previously generated data file for differences (the check command).

fin-ger commented 5 years ago

I built a new version (0.4.0) that uses zbox from the current git master and added RUST_LOG=zbox=trace to the run and check action of the test (run-test.exp, run-check.exp).

burmecia commented 5 years ago

Thanks @fin-ger . What I found is it looks like QEMU didn't flush write data to its driver. After the test crashed on the 7th round, the repo folder is like this:

zbox:~# ls -l zbox-fail-test-repo total 8 drwxr-xr-x 2 root root 4096 Apr 4 15:15 data drwxr-xr-x 4 root root 4096 Apr 4 15:15 index -rw-r--r-- 1 root root 0 Apr 4 15:15 super_blk.1

So you can see there is only one super block and it is empty. And the wal folder is not even created at all. Super block and wal must be guaranteed persistent to disk. The correct one should like this:

/vol # ls -l zbox-fail-test-repo total 16 drwxr-xr-x 5 root root 160 Apr 4 11:26 data drwxr-xr-x 5 root root 160 Apr 4 11:26 index -rw-r--r-- 1 root root 8192 Apr 4 11:26 super_blk.0 -rw-r--r-- 1 root root 8192 Apr 4 11:26 super_blk.1 drwxr-xr-x 8 root root 256 Apr 4 11:26 wal

So that means QEMU lies to zbox the write() and flush() are completed but it is actually not. The possible reason could be the cache mode not specified when starting the QEMU VM. You can try add it in run-test.exp line 10

-drive file=qemu/zbox.img,format=raw,cache=directsync

Different cache mode explanation can be found here. I've tried some but still cannot see the files are guaranteed written to disk.

fin-ger commented 5 years ago

Okay, so if this is a qemu issue than it is not relevant for zbox. Have you tested failures of real machines with zbox?

burmecia commented 5 years ago

Honestly, I haven't tested the real machine failure because I can't find a good reproducible way to do that test. But I did some random IO error fuzz tests by using a special faulty storage. That storage will generate IO error and the fuzzer will reopen repo randomly but deterministically.

Your test makes me think maybe I can use QEMU to do the fuzz crash test, just like this guy did for OS testing, but still need to figure out how to make persistent write in QEMU first.

fin-ger commented 5 years ago

I tested if a dd if=dd-test-src of=dd-test-dst status=progress would also produce a dd-test-dst of 0 bytes. And indeed, no matter which -drive ...,cache=something I provided, it was always 0 bytes in size. Than I tried doing

dd if=dd-test-src of=dd-test-dst status=progress iflag=direct oflag=direct

without any cache flag provided for qemu, and than the dd-test-dst has roughly the size reported by the dd progress. I also tried oflag=dsync,nocache and it also worked. I am currently trying to setup an expect script for the dd command. The VM needs coreutils to run the above command as the provided dd by alpine does not support the iflag and oflag. I am also looking into comparing the dst and src file but did not come up with a good solution yet.

fin-ger commented 5 years ago

I have added run-dd-test.exp and run-dd-check.exp. The test writes a generated string file (can be diffed :sweat_smile:) with dd and oflag=direct to dd-test-dst and the check looks if there are any lines in dd-test-dst that are not in dd-test-src. The expected result is only one additional line (the one not completed during the write) in the dd-test-dst file. So it looks like with dd qemu is handling the I/O correctly or maybe just "better". I will look into the dd source code later!

burmecia commented 5 years ago

QEMU file io looks so tricky, I might test the dd using different images later on.

zboxfs / zbox

zbox is unable to successfully handle a machine failure #36