Open fin-ger opened 5 years ago
thank you very much @fin-ger , that's an interesting test.
I run your test script and can see it stopped at 7th round but cannot reproduce the error you posted. That crash happened during writing file, not during opening the repo. From your error log, it looks like the error happened when reading super block, which is during open an existing repo. Could you share how you build the executable file on Alpine Linux? I tried built one on Ubuntu but it cannot run on Alpine.
Also, the published zbox v0.6.1 is quite old, the latest code on master branch has been refactored a lot and with many bugs fixed and performance improvement, could you test using the latest code instead? Just use below dependency line in your Cargo.toml:
zbox = { git = "https://github.com/zboxfs/zbox.git", features = ["storage-file"] }
Another tip is you can turn on zbox debug output by setting environment variable in filerun-test.exp
:
RUST_LOG=zbox=trace
Looking forward to seeing more result, thanks.
The error happens when running zbox-fail-test --file data check
in the previously forcefully stopped VM (run-check.exp)
The executable was automatically build by the travis-ci configuration. I am using the official alpine:edge docker container:
docker run --rm -v $(pwd):/volume alpine:edge /bin/sh -c 'cd /volume && apk add rust cargo libsodium-dev && export SODIUM_LIB_DIR=/usr/lib && export SODIUM_STATIC=true && cargo build --target x86_64-alpine-linux-musl'
The executable can be found in ./target/x86_64-alpine-linux-musl/debug/zbox-fail-test
.
I will create a new version of my test now which uses the latest master of zbox and the RUST_LOG
configuration.
The error is only happening when running the test inside a VM that gets forcefully stopped. The repository is afterwards (booting the VM again) checked against the previously generated data
file for differences (the check command).
I built a new version (0.4.0) that uses zbox from the current git master and added RUST_LOG=zbox=trace
to the run and check action of the test (run-test.exp, run-check.exp).
Thanks @fin-ger . What I found is it looks like QEMU didn't flush write data to its driver. After the test crashed on the 7th round, the repo folder is like this:
zbox:~# ls -l zbox-fail-test-repo total 8 drwxr-xr-x 2 root root 4096 Apr 4 15:15 data drwxr-xr-x 4 root root 4096 Apr 4 15:15 index -rw-r--r-- 1 root root 0 Apr 4 15:15 super_blk.1
So you can see there is only one super block and it is empty. And the wal folder is not even created at all. Super block and wal must be guaranteed persistent to disk. The correct one should like this:
/vol # ls -l zbox-fail-test-repo total 16 drwxr-xr-x 5 root root 160 Apr 4 11:26 data drwxr-xr-x 5 root root 160 Apr 4 11:26 index -rw-r--r-- 1 root root 8192 Apr 4 11:26 super_blk.0 -rw-r--r-- 1 root root 8192 Apr 4 11:26 super_blk.1 drwxr-xr-x 8 root root 256 Apr 4 11:26 wal
So that means QEMU lies to zbox the write() and flush() are completed but it is actually not. The possible reason could be the cache mode not specified when starting the QEMU VM. You can try add it in run-test.exp line 10
-drive file=qemu/zbox.img,format=raw,cache=directsync
Different cache mode explanation can be found here. I've tried some but still cannot see the files are guaranteed written to disk.
Okay, so if this is a qemu issue than it is not relevant for zbox. Have you tested failures of real machines with zbox?
Honestly, I haven't tested the real machine failure because I can't find a good reproducible way to do that test. But I did some random IO error fuzz tests by using a special faulty storage. That storage will generate IO error and the fuzzer will reopen repo randomly but deterministically.
Your test makes me think maybe I can use QEMU to do the fuzz crash test, just like this guy did for OS testing, but still need to figure out how to make persistent write in QEMU first.
I tested if a dd if=dd-test-src of=dd-test-dst status=progress
would also produce a dd-test-dst
of 0 bytes. And indeed, no matter which -drive ...,cache=something
I provided, it was always 0 bytes in size. Than I tried doing
dd if=dd-test-src of=dd-test-dst status=progress iflag=direct oflag=direct
without any cache
flag provided for qemu, and than the dd-test-dst
has roughly the size reported by the dd progress. I also tried oflag=dsync,nocache
and it also worked. I am currently trying to setup an expect script for the dd command. The VM needs coreutils
to run the above command as the provided dd by alpine does not support the iflag
and oflag
. I am also looking into comparing the dst
and src
file but did not come up with a good solution yet.
I have added run-dd-test.exp
and run-dd-check.exp
. The test writes a generated string file (can be diffed :sweat_smile:) with dd and oflag=direct
to dd-test-dst
and the check looks if there are any lines in dd-test-dst
that are not in dd-test-src
. The expected result is only one additional line (the one not completed during the write) in the dd-test-dst
file. So it looks like with dd qemu is handling the I/O correctly or maybe just "better". I will look into the dd source code later!
QEMU file io looks so tricky, I might test the dd
using different images later on.
As this filesystem aims to provide ACID functionality, I tested if it can handle a full machine failure while a write is in progress. My test shows that zbox fails to even open the repository after such a machine failure.
I tried to make the test as reproducible as possible, so you can recreate it on your own machine. The machine failure is simulated by forcefully shutting down a virtual machine where a zbox program is currently writing on.
Is recovery from a full machine failure not supported yet or am I using the zbox API in wrong way? (main.rs)