Open vi opened 4 years ago
Similar panic happens in sqlite://
mode (with a "database is full" error instead). In that case however storage can be opened again (if I grow tmpfs a little bit). Unfinished information in the file is lost, obviously (how do I recover it by the way?).
Thank you for issue, that's a good edge case.
The unfinished data won't be able to recover as writing is processed in a transaction. If writing doesn't call finish()
, the transaction doesn't commit and all uncommitted data will be lost.
If writing doesn't call finish(), the transaction doesn't commit and all uncommitted data will be lost.
After a crash with unfinished writes, storage is still full in megabytes. Are those files securely inaccessible (e.g. no decryption key available) or just unrefereced?
With mem://
storage, running out of space is even nastier: memory allocation of 8192 bytes failed
, then just plain abort.
I think ZboxFS should actively mind available room (in file://
, in memory, in database) and reserve some space, so write
s start failing while transaction can still be committed.
If writing doesn't call finish(), the transaction doesn't commit and all uncommitted data will be lost.
After a crash with unfinished writes, storage is still full in megabytes. Are those files securely inaccessible (e.g. no decryption key available) or just unrefereced?
Yes, those files are still secured but they are inaccessible.
With
mem://
storage, running out of space is even nastier:memory allocation of 8192 bytes failed
, then just plain abort.I think ZboxFS should actively mind available room (in
file://
, in memory, in database) and reserve some space, sowrite
s start failing while transaction can still be committed.
That could be hard as some storage are not able to retrieve free space at run time, even through it can be retrieved the cost to constantly monitor it is still high.
That could be hard as some storage are not able to retrieve free space at run time, even through it can be retrieved the cost to constantly monitor it is still high.
ZboxFS can just pre-allocate some space needed to complete a transaction.
Yes, those files are still secured but they are inaccessible.
Secured with the same key as the rest of files or secured with some session key that is supposed to be written to storage only on transaction commit?
If the former, there should be some recovery function that trades consistency for availability, allowing to read possibly broken data from outside normal transaction finish. Like git fsck
+ git cat-file
for Git.
That could be hard as some storage are not able to retrieve free space at run time, even through it can be retrieved the cost to constantly monitor it is still high.
ZboxFS can just pre-allocate some space needed to complete a transaction.
Accurately pre-allocate space is impossible because it has no idea how much data user will write in. The file writing is like streaming, until transaction was committed there is no way to know the actual size in advance.
Accurately pre-allocate space is impossible because it has no idea how much data user will write in. The file writing is like streaming, until transaction was committed there is no way to know the actual size in advance.
I see megabytes used by file://
storage going up while writing a file even before committing a transaction. Does it mean only some constant additional space is needed to commit transaction when std::io::Write::write
calls are already done?
Anyway, even a heuristical "pre-allocate X kilobytes for transaction commits" where X is configurable by user can be beneficial, to reduce probability of finish()
also failing after write
has failed.
In short: running out of space should not lead to big data loss, only maybe a limited one.
Yes, those files are still secured but they are inaccessible.
Secured with the same key as the rest of files or secured with some session key that is supposed to be written to storage only on transaction commit?
If the former, there should be some recovery function that trades consistency for availability, allowing to read possibly broken data from outside normal transaction finish. Like
git fsck
+git cat-file
for Git.
ZboxFS uses the former. When writing data to file, it firstly writes data to a staging area. When transaction is committing, those staging data will be transferred to content management system. That means any uncommitted data will not be in content management system, thus it cannot be accessed and it will discard when transaction aborted.
When writing data to file, it firstly writes data to a staging area.
How do I recover data from that staging area if transaction was neither cleanly finished nor cleanly aborted?
Does that staging area contain all necessary information needed to complete a transaction or some things are only kept in RAM?
Accurately pre-allocate space is impossible because it has no idea how much data user will write in. The file writing is like streaming, until transaction was committed there is no way to know the actual size in advance.
I see megabytes used by
file://
storage going up while writing a file even before committing a transaction. Does it mean only some constant additional space is needed to commit transaction whenstd::io::Write::write
calls are already done?Anyway, even a heuristical "pre-allocate X kilobytes for transaction commits" where X is configurable by user can be beneficial, to reduce probability of
finish()
also failing afterwrite
has failed.In short: running out of space should not lead to big data loss, only maybe a limited one.
You see the storage going up because of the staging data, the file content has been written to storage and committing tx is just to 'mark' it as permanent. Committing tx needs to write a WAL file, its size is not large but still depends on how large the file you are writing. Larger file produces more content nodes to be added to tx, thus the wal file size will increase. Heuristically estimate the wal file size in advance is impossible, but I agree pre-allocation some blocks for wal is a good idea, and it might be able to handle most cases.
When writing data to file, it firstly writes data to a staging area.
How do I recover data from that staging area if transaction was neither cleanly finished nor cleanly aborted?
Does that staging area contain all necessary information needed to complete a transaction or some things are only kept in RAM?
The staging area only contains file content and it is referenced by a temporary content node which is only in memory. That content node is merged with pre-existing file content node and written to storage when committing. Thus, if tx is not completed, the temporary content is lost and staging data cannot be accessed.
Would you mind share your code and setup, so I can reproduce this issue locally and that will be helpful? Thanks.
High-level setup is outlined in the opening comment.
Here is example code.
src/main.rs
:
fn main() -> Result<(), Box<dyn std::error::Error>> {
env_logger::init();
zbox::init_env();
let mut r = zbox::RepoOpener::new().create(true).force(true).open("file:///tmp/limit/q", "123asd")?;
use std::io::Write;
let mut f = r.create_file("/test")?;
loop {
match f.write(&[77;65536]) {
Ok(_) => {
eprint!(".")
}
Err(e) => {
eprintln!("\nwrite error: {}", e);
break;
}
}
}
f.finish()?;
Ok(())
}
Cargo.toml
:
[package]
name = "zboxtest"
version = "0.1.0"
edition = "2018"
[dependencies]
zbox = {version="0.8.8",features=["storage-file"]}
env_logger = "0.7.1"
tmpfs
with size limit: mkdir /tmp/limit; mount -t tmpfs tmp /tmp/limit/ -o size=100M
RUST_BACKTRACE=1 cargo run
Heuristically estimate the wal file size in advance is impossible
Either this is not true (I suppose it should be something like a logarithm of file size being written) or design is flawed somewhere.
It may be reasonable that it's not possible to calculate the size exactly beforehand, but impossibility to estimate (e.g. provide some upper bound), heuristically (i.e. it works with high probability, but can fail) looks very suspicious.
Space being reserved for wal file can even dynamically grow based on ongoing written,but uncommitted data size.
If I want, for example, to store 10 terabytes of data into ZboxFS, how often am I recommended to finish()
?. Does Zbox like mega-transactions or medium-sized (e.g. 64-megabyte) transactions are preferred?
Thus, if tx is not completed, the temporary content is lost and staging data cannot be accessed.
Do I correctly understand that recovery of that lost content is theoretically possible (given the correct password obviously), but is not implemented currently.
Heuristically estimate the wal file size in advance is impossible
Either this is not true (I suppose it should be something like a logarithm of file size being written) or design is flawed somewhere.
It may be reasonable that it's not possible to calculate the size exactly beforehand, but impossibility to estimate (e.g. provide some upper bound), heuristically (i.e. it works with high probability, but can fail) looks very suspicious.
Space being reserved for wal file can even dynamically grow based on ongoing written,but uncommitted data size.
If I want, for example, to store 10 terabytes of data into ZboxFS, how often am I recommended to
finish()
?. Does Zbox like mega-transactions or medium-sized (e.g. 64-megabyte) transactions are preferred?
ZboxFs uses mega-transaction, so if you use a single finish()
for a large file the result would be either success or nothing. I suggest splitting the file to smaller chunk then write and use multiplefinish()
. But how small the chunk depends on your situation, ZboxFS doesn't have preferred value for that.
Thus, if tx is not completed, the temporary content is lost and staging data cannot be accessed.
Do I correctly understand that recovery of that lost content is theoretically possible (given the correct password obviously), but is not implemented currently.
Theoretically it is possible but hard to implement and I also think it is not necessary. The application should be aware of that any uncommitted data can be lost, and take proper actions to deal with this situation. ZboxFS itself only gives atomicity promise, a transaction is either success or failure in full. Data recovery is the application's responsibility. You can think of ZboxFS as a RDBMS, which require user to re-submit the whole transaction again in case of failure.
So for this issue, the root cause is actually because of improper error propagation while loading cow object. A cow object has two copies (arms), each change will make a copy of current arm and write to the other arm. When loading the cow object, the error should pop up only when both of the arms loading failed. If any arm was successfully loaded, we should use that arm.
In this case, the correct result should be like this:
I've change it in c30a303, can you test it again in your end? Thanks.
Failed transaction needs to abort
What is a failed transaction (in terms of public ZboxFS API)? Does any std::io::Error
from std::io::Write::write
mark transaction as doomed? Can I check zbox::File
if it is in middle of a failed transaction?
I've change it in c30a303, can you test it again in your end?
Tried again with zbox commit c30a303cdf417f8b8f491a950ed8195d95f51bb5:
[2020-01-08T20:46:42Z WARN zbox::trans::txmgr] abort tx#3 failed: No space left on device (os error 28)
[2020-01-08T20:46:42Z ERROR zboxmount::zboxfuse] IO error: other os error
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', /rustc/3eeb8d4f2fbae0bb1c587d00b5abeaf938da47f4/src/libcore/macros/mod.rs:15:40
stack backtrace:
0: backtrace::backtrace::libunwind::trace
at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/libunwind.rs:88
1: backtrace::backtrace::trace_unsynchronized
at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/mod.rs:66
2: std::sys_common::backtrace::_print_fmt
at src/libstd/sys_common/backtrace.rs:84
3: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
at src/libstd/sys_common/backtrace.rs:61
4: core::fmt::write
at src/libcore/fmt/mod.rs:1025
5: std::io::Write::write_fmt
at src/libstd/io/mod.rs:1426
6: std::sys_common::backtrace::_print
at src/libstd/sys_common/backtrace.rs:65
7: std::sys_common::backtrace::print
at src/libstd/sys_common/backtrace.rs:50
8: std::panicking::default_hook::{{closure}}
at src/libstd/panicking.rs:193
9: std::panicking::default_hook
at src/libstd/panicking.rs:210
10: std::panicking::rust_panic_with_hook
at src/libstd/panicking.rs:471
11: rust_begin_unwind
at src/libstd/panicking.rs:375
12: core::panicking::panic_fmt
at src/libcore/panicking.rs:84
13: core::panicking::panic
at src/libcore/panicking.rs:51
14: core::option::Option<T>::unwrap
at /rustc/3eeb8d4f2fbae0bb1c587d00b5abeaf938da47f4/src/libcore/macros/mod.rs:15
15: zbox::trans::txmgr::TxMgr::abort_trans
at /mnt/src/git/zbox/src/trans/txmgr.rs:157
16: zbox::trans::txmgr::TxHandle::abort
at /mnt/src/git/zbox/src/trans/txmgr.rs:238
17: zbox::trans::txmgr::TxHandle::run
at /mnt/src/git/zbox/src/trans/txmgr.rs:208
18: <zbox::file::File as std::io::Write>::write
at /mnt/src/git/zbox/src/file.rs:586
19: <zboxmount::zboxfuse::ZboxFs as fuse_mt::types::FilesystemMT>::write
at src/zboxfuse.rs:450
20: <fuse_mt::fusemt::FuseMT<T> as fuse::Filesystem>::write::{{closure}}
at /home/vi/.cargo/registry/src/github.com-1ecc6299db9ec823/fuse_mt-0.5.0/src/fusemt.rs:351
21: fuse_mt::fusemt::FuseMT<T>::threadpool_run
at /home/vi/.cargo/registry/src/github.com-1ecc6299db9ec823/fuse_mt-0.5.0/src/fusemt.rs:76
22: <fuse_mt::fusemt::FuseMT<T> as fuse::Filesystem>::write
at /home/vi/.cargo/registry/src/github.com-1ecc6299db9ec823/fuse_mt-0.5.0/src/fusemt.rs:350
23: fuse::request::Request::dispatch
at /home/vi/.cargo/registry/src/github.com-1ecc6299db9ec823/fuse-0.3.1/src/request.rs:278
24: fuse::request::dispatch
at /home/vi/.cargo/registry/src/github.com-1ecc6299db9ec823/fuse-0.3.1/src/request.rs:34
25: fuse::session::Session<FS>::run
at /home/vi/.cargo/registry/src/github.com-1ecc6299db9ec823/fuse-0.3.1/src/session.rs:79
26: fuse::mount::{{closure}}
at /home/vi/.cargo/registry/src/github.com-1ecc6299db9ec823/fuse-0.3.1/src/lib.rs:376
27: core::result::Result<T,E>::and_then
at /rustc/3eeb8d4f2fbae0bb1c587d00b5abeaf938da47f4/src/libcore/result.rs:715
28: fuse::mount
at /home/vi/.cargo/registry/src/github.com-1ecc6299db9ec823/fuse-0.3.1/src/lib.rs:376
29: zboxmount::zboxfuse::mount
at src/zboxfuse.rs:710
30: zboxmount::main
at src/main.rs:144
31: std::rt::lang_start::{{closure}}
at /rustc/3eeb8d4f2fbae0bb1c587d00b5abeaf938da47f4/src/libstd/rt.rs:67
32: std::rt::lang_start_internal::{{closure}}
at src/libstd/rt.rs:52
33: std::panicking::try::do_call
at src/libstd/panicking.rs:292
34: __rust_maybe_catch_panic
at src/libpanic_unwind/lib.rs:78
35: std::panicking::try
at src/libstd/panicking.rs:270
36: std::panic::catch_unwind
at src/libstd/panic.rs:394
37: std::rt::lang_start_internal
at src/libstd/rt.rs:51
38: std::rt::lang_start
at /rustc/3eeb8d4f2fbae0bb1c587d00b5abeaf938da47f4/src/libstd/rt.rs:67
39: main
40: __libc_start_main
41: _start
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
[2020-01-08T20:46:42Z WARN fuse::reply] Reply not sent for operation 26374, replying with I/O error
[2020-01-08T20:46:42Z INFO fuse::session] Unmounted /home/vi/code/zboxmount/m
[2020-01-08T20:46:42Z INFO zbox::fs::fs] repo closed
Still the same panic.
Failed transaction needs to abort
What is a failed transaction (in terms of public ZboxFS API)? Does any
std::io::Error
fromstd::io::Write::write
mark transaction as doomed? Can I checkzbox::File
if it is in middle of a failed transaction?
Transaction is used internally when mutating file or file system. There is no API can check its status. Before writing a file, a transaction is created internally and keep updating while writing that file. Any error during the writing will mark that transaction failed and to abort. If everything goes fine, finish()
will commit that transaction in the end and form a permanent content version.
I've change it in c30a303, can you test it again in your end?
Tried again with zbox commit c30a303:
[2020-01-08T20:46:42Z WARN zbox::trans::txmgr] abort tx#3 failed: No space left on device (os error 28) [2020-01-08T20:46:42Z ERROR zboxmount::zboxfuse] IO error: other os error thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', /rustc/3eeb8d4f2fbae0bb1c587d00b5abeaf938da47f4/src/libcore/macros/mod.rs:15:40 stack backtrace: 0: backtrace::backtrace::libunwind::trace at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/libunwind.rs:88 1: backtrace::backtrace::trace_unsynchronized at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/mod.rs:66 2: std::sys_common::backtrace::_print_fmt at src/libstd/sys_common/backtrace.rs:84 3: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt at src/libstd/sys_common/backtrace.rs:61 4: core::fmt::write at src/libcore/fmt/mod.rs:1025 5: std::io::Write::write_fmt at src/libstd/io/mod.rs:1426 6: std::sys_common::backtrace::_print at src/libstd/sys_common/backtrace.rs:65 7: std::sys_common::backtrace::print at src/libstd/sys_common/backtrace.rs:50 8: std::panicking::default_hook::{{closure}} at src/libstd/panicking.rs:193 9: std::panicking::default_hook at src/libstd/panicking.rs:210 10: std::panicking::rust_panic_with_hook at src/libstd/panicking.rs:471 11: rust_begin_unwind at src/libstd/panicking.rs:375 12: core::panicking::panic_fmt at src/libcore/panicking.rs:84 13: core::panicking::panic at src/libcore/panicking.rs:51 14: core::option::Option<T>::unwrap at /rustc/3eeb8d4f2fbae0bb1c587d00b5abeaf938da47f4/src/libcore/macros/mod.rs:15 15: zbox::trans::txmgr::TxMgr::abort_trans at /mnt/src/git/zbox/src/trans/txmgr.rs:157 16: zbox::trans::txmgr::TxHandle::abort at /mnt/src/git/zbox/src/trans/txmgr.rs:238 17: zbox::trans::txmgr::TxHandle::run at /mnt/src/git/zbox/src/trans/txmgr.rs:208 18: <zbox::file::File as std::io::Write>::write at /mnt/src/git/zbox/src/file.rs:586 19: <zboxmount::zboxfuse::ZboxFs as fuse_mt::types::FilesystemMT>::write at src/zboxfuse.rs:450 20: <fuse_mt::fusemt::FuseMT<T> as fuse::Filesystem>::write::{{closure}} at /home/vi/.cargo/registry/src/github.com-1ecc6299db9ec823/fuse_mt-0.5.0/src/fusemt.rs:351 21: fuse_mt::fusemt::FuseMT<T>::threadpool_run at /home/vi/.cargo/registry/src/github.com-1ecc6299db9ec823/fuse_mt-0.5.0/src/fusemt.rs:76 22: <fuse_mt::fusemt::FuseMT<T> as fuse::Filesystem>::write at /home/vi/.cargo/registry/src/github.com-1ecc6299db9ec823/fuse_mt-0.5.0/src/fusemt.rs:350 23: fuse::request::Request::dispatch at /home/vi/.cargo/registry/src/github.com-1ecc6299db9ec823/fuse-0.3.1/src/request.rs:278 24: fuse::request::dispatch at /home/vi/.cargo/registry/src/github.com-1ecc6299db9ec823/fuse-0.3.1/src/request.rs:34 25: fuse::session::Session<FS>::run at /home/vi/.cargo/registry/src/github.com-1ecc6299db9ec823/fuse-0.3.1/src/session.rs:79 26: fuse::mount::{{closure}} at /home/vi/.cargo/registry/src/github.com-1ecc6299db9ec823/fuse-0.3.1/src/lib.rs:376 27: core::result::Result<T,E>::and_then at /rustc/3eeb8d4f2fbae0bb1c587d00b5abeaf938da47f4/src/libcore/result.rs:715 28: fuse::mount at /home/vi/.cargo/registry/src/github.com-1ecc6299db9ec823/fuse-0.3.1/src/lib.rs:376 29: zboxmount::zboxfuse::mount at src/zboxfuse.rs:710 30: zboxmount::main at src/main.rs:144 31: std::rt::lang_start::{{closure}} at /rustc/3eeb8d4f2fbae0bb1c587d00b5abeaf938da47f4/src/libstd/rt.rs:67 32: std::rt::lang_start_internal::{{closure}} at src/libstd/rt.rs:52 33: std::panicking::try::do_call at src/libstd/panicking.rs:292 34: __rust_maybe_catch_panic at src/libpanic_unwind/lib.rs:78 35: std::panicking::try at src/libstd/panicking.rs:270 36: std::panic::catch_unwind at src/libstd/panic.rs:394 37: std::rt::lang_start_internal at src/libstd/rt.rs:51 38: std::rt::lang_start at /rustc/3eeb8d4f2fbae0bb1c587d00b5abeaf938da47f4/src/libstd/rt.rs:67 39: main 40: __libc_start_main 41: _start note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace. [2020-01-08T20:46:42Z WARN fuse::reply] Reply not sent for operation 26374, replying with I/O error [2020-01-08T20:46:42Z INFO fuse::session] Unmounted /home/vi/code/zboxmount/m [2020-01-08T20:46:42Z INFO zbox::fs::fs] repo closed
Still the same panic.
This panic is because you're calling finish
after a failed write
. write
failure will cause transaction abort and removed from transaction manager, thus finish
should not be called after that. I've added more defense code to deal with this case in eb8b135. Can you try it again?
After eb8b135 finish
indeed stops causing panic.
Similar protection however should be added to zbox::File::write
and other things that face transactions.
Similar panic is still happening with a modified test code:
fn main() -> Result<(), Box<dyn std::error::Error>> {
env_logger::init();
zbox::init_env();
use std::io::Write;
let mut f : zbox::File;
let mut r = zbox::RepoOpener::new().create(true).force(true).open("file:///tmp/limit/q", "qwe123")?;
f = r.create_file("/test")?;
let mut remaining_failed_writes = 10;
loop {
match f.write(&[77;65536]) {
Ok(_) => {
eprint!(".")
}
Err(e) => {
eprintln!("\nwrite error: {}", e);
if remaining_failed_writes == 0 { break; }
remaining_failed_writes -= 1;
}
}
}
f.finish()?;
Ok(())
}
Actually I expect that a File::write
may succeed with smaller buffer size after another File::write
failed because of ENOSPC
. Not every IO error should be unrecoverable.
What is official way to recover from ENOSPC
and write a smaller file which would fit the storage?
Even after proper (non-panic) repository close after ENOSPC it remains cluttered with uncleared aborted transaction and no more transactions are accepted, even smaller ones.
Here is source code illustrating what do I mean:
fn main() -> Result<(), Box<dyn std::error::Error>> {
env_logger::init();
zbox::init_env();
use std::io::Write;
let mut f : zbox::File;
let mut r = zbox::RepoOpener::new().create(true).force(true).open("file:///tmp/limit/q", "qwe123")?;
f = r.create_file("/test")?;
let mut written_size : usize = 0;
eprintln!("Trying to create a file as big as possible");
loop {
match f.write(&[77;65536]) {
Ok(x) => {
eprint!(".");
written_size+=x;
}
Err(e) => {
eprintln!("\nwrite error: {}", e);
break;
}
}
}
eprintln!("\nOK, we cannot write {} bytes here. Let's try storing a {}-byte file instead", written_size, written_size/2);
r.remove_file("/test")?;
f = r.create_file("/test.smaller")?;
let mut written_size2 : usize = 0;
loop {
match f.write(&[77;65536]) {
Ok(x) => {
eprint!(".");
written_size2+=x;
if written_size2 * 2 >= written_size {
eprintln!("\nOK, enough. Committing");
break;
}
}
Err(e) => {
eprintln!("\nwrite error: {}", e);
break;
}
}
}
f.finish()?;
Ok(())
}
Log:
[2020-01-09T08:21:12Z INFO zbox::base] ZboxFS v0.8.8 - Zero-details, privacy-focused in-app file system
[2020-01-09T08:21:12Z INFO zbox::fs::fs] create repo: file:///tmp/limit/q
[2020-01-09T08:21:12Z DEBUG zbox::volume::volume] volume initialised
[2020-01-09T08:21:12Z DEBUG zbox::trans::txmgr] begin tx#1
[2020-01-09T08:21:12Z DEBUG zbox::trans::trans] commit tx#1, cohorts: 2
[2020-01-09T08:21:12Z DEBUG zbox::trans::txmgr] tx#1 committed
[2020-01-09T08:21:12Z INFO zbox::fs::fs] repo created
[2020-01-09T08:21:12Z DEBUG zbox::trans::txmgr] begin tx#2
[2020-01-09T08:21:12Z DEBUG zbox::trans::trans] commit tx#2, cohorts: 4
[2020-01-09T08:21:12Z DEBUG zbox::trans::txmgr] tx#2 committed
Trying to create a file as big as possible
[2020-01-09T08:21:12Z DEBUG zbox::trans::txmgr] begin tx#3
.............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
[2020-01-09T08:21:21Z DEBUG zbox::trans::txmgr] run tx failed: Io(Custom { kind: Other, error: "other os error" })
[2020-01-09T08:21:21Z DEBUG zbox::trans::txmgr] abort tx#3
[2020-01-09T08:21:21Z WARN zbox::trans::txmgr] abort tx#3 failed: No space left on device (os error 28)
write error: other os error
OK, we cannot write 104660992 bytes here. Let's try storing a 52330496-byte file instead
[2020-01-09T08:21:21Z DEBUG zbox::trans::wal] hot redo abort tx#3
[2020-01-09T08:21:21Z DEBUG zbox::trans::wal] hot redo abort failed: Io(Os { code: 28, kind: Other, message: "No space left on device" })
[2020-01-09T08:21:21Z INFO zbox::fs::fs] repo closed
Error: Io(Os { code: 28, kind: Other, message: "No space left on device" })
Expected: after failing to write the big file, it succeeds to write and commit smaller file.
There is no way to recover when writing is failed, the reason is described above. The recommended way writing big file is to use smaller batch and call finish()
multiple times when batch is full. By doing that way, file data is gradually committed in multiple batches rather than in a one big transaction. When the disk is full and error popped up, your file will end up with partially written.
The example code is like below:
fn main() -> Result<(), Box<dyn std::error::Error>> {
env_logger::init();
zbox::init_env();
let mut r = zbox::RepoOpener::new()
.create(true)
.force(true)
.open("file:///mnt/q", "123asd")?;
use std::io::Write;
let src = vec![77; 65536 * 1024];
let batch_size = 8 * 1024 * 1024; // choose proper batch size suits your need
let mut offset = 0;
let mut written_in_batch = 0;
let mut retry = 3;
let mut f = r.create_file("/test")?;
while offset < src.len() {
match f.write(&src[offset..]) {
Ok(written) => {
eprint!(".");
offset += written;
written_in_batch += written;
if written_in_batch >= batch_size {
f.finish()?;
written_in_batch = 0;
}
}
Err(e) => {
eprintln!("\nwrite error (retry #{}): {}", retry, e);
retry -= 1;
if retry == 0 {
break;
}
}
}
}
Ok(())
}
The recommended way writing big file is to use smaller batch and call finish() multiple times when batch is full.
Can this be documented in API?
Does Zbox like mega-transactions or medium-sized (e.g. 64-megabyte) transactions are preferred?
Looks like the answer is ZboxFS does not like multi-gigabyte transactions and keeping reasonable upper bound on number of unfinished bytes is recommended.
The recommended way writing big file is to use smaller batch and call finish() multiple times when batch is full.
Can this be documented in API?
Sure I will make it more clear in the document.
Does Zbox like mega-transactions or medium-sized (e.g. 64-megabyte) transactions are preferred?
Looks like the answer is ZboxFS does not like multi-gigabyte transactions and keeping reasonable upper bound on number of unfinished bytes is recommended.
Actually, ZboxFS doesn't have preferred way or size of transaction of write big file, it all depends on the application's requirement.
If app wants integrity and atomicity, use single finish()
and put all in one transaction. If app wants to keep as much as possible and doesn't like lost data in case of failure, use multiple finish()
and adjust the commit granularity to meet their lost tolerance.
As of 330ed30840e1564f08a0cfe1c67604a51844d13e it no longer panics.
However, once run out of free space, I cannot even delete an existing file to free up some space. Repository effectively becomes read-only.
As of 330ed30 it no longer panics.
However, once run out of free space, I cannot even delete an existing file to free up some space. Repository effectively becomes read-only.
Yes, that's true. Each transaction needs to write a WAL file beforehand, when there is no free space left no WAL file can be persisted, thus no transaction can be made and the repo becomes effectively read-only. There might be better way to deal with this situation, but it needs thoroughly thinking and might need some trade-offs.
Is there upper bound on WAL file size needed to remove one ZboxFS File?
Such file may be kept allocated in advance for such cases.
Also why aborted transaction does not clean up it's unused files in file://
storage?
Is there upper bound on WAL file size needed to remove one ZboxFS File?
Such file may be kept allocated in advance for such cases.
The difficulty is WAL file size is dynamic and unknown in advance. Pre allocation might work in most cases but cannot guarantee.
Also why aborted transaction does not clean up it's unused files in
file://
storage?
The same reason. The sector file needs write a copy first when change, that's how COW works. When no space left, the aborted tx even cannot reclaim used space. That's why you see tx abort failed.
I just checked with sqlite, it also cannot complete any transaction when disk is full. So this behavior might be valid for ZboxFS as well, but I will keep trying to mitigate it as much as possible. My think is that, when disk is full, at least the transaction for file deletion should still be able to execute, otherwise the app will get stuck in this limbo. The solution might be pre-allocation for some fix-sized space for wal and rotate it when transactions flow in.
After carefully checking different possible solutions, I think the repo mutability when disk full is hard to implement, so I will have to leave it unimplemented. The expected result will be same as sqlite, when disk is full the repo becomes read-only and no more transactions can be made, even deletion.
The major difficulties I found are:
Io::Error::Other
What a WAL file consists of? What affects its file size?
Maybe pre-allocated WAL size should be just a user-configurable parameter? That file would normally be unused except of when:
Io::Error::Other
.What a WAL file consists of? What affects its file size?
Maybe pre-allocated WAL size should be just a user-configurable parameter? That file would normally be unused except of when:
* The transaction in question is just deleting one file; * Attempt to do that transaction normally failed with `Io::Error::Other`.
The WAL file contains all affected objects in the transaction, including new, update and deletion. When writing data to a file, the data will be split to chunks and chunks will be grouped to limit-sized segments, each segment is an new object to be put in the transaction. So when keeping writing data, more and more new segments are created and added to transaction, thus the WAL file size will keep increasing.
It's hard to predict the WAL size because we don't know how many bytes users will write to a file. Even though we know it, the number of chunks and segments will still hard to know because they are depending on the Rabin rolling hashing algorithm, which will produce variable-length chunks based on the data content.
How many chunks/segments are required to delete one file?
Maybe in "only delete 1 file" special case WAL size is more predictable compared to general case?
To delete one file, one transaction needed, which consists of 2 wal files (one for tx itself, one for wal queue). For deletion, if it is not in extreme case, normally no more than several kilo bytes. But the difficulty is not this, it is how to reliably capture the disk full error without sacrifice efficiency.
Just capture it unreliably.
If it is not a disk full error, it is probably no problem trying to handle it like if it were a full disk error (i.e. trying to use the reserved files). It would just fail again.
That will be overkill, that error in Rust is too general and you will end up with doing unnecessary error handling for all other errors.
Steps to reproduce:
file://
storage on that sizezbox::File
there and start filling it with data (withoutfinish
ing) ad infinum.Expected:
Eventually
std::io::Write::write
returns IO error suggesting that allowed storage space is exhaused. Ideally the file may befinish()
ed at that point, with the data previously accepted bystd::io::Write::write
without error being guaranteed to fit on remaining space.Actual:
Additionally, storage cannot be opened again after that: