tokio-rs / tokio

A runtime for writing reliable asynchronous applications with Rust. Provides I/O, networking, scheduling, timers, ...
https://tokio.rs
MIT License
27.2k stars 2.5k forks source link

fs: Data loss while retrying `File::flush` when disk is full #6325

Open Xuanwo opened 10 months ago

Xuanwo commented 10 months ago

Version

:) cargo tree | grep tokio
│   │   └── tokio v1.36.0
│   │       └── tokio-macros v2.2.0 (proc-macro)
│   │   │   │   ├── tokio v1.36.0 (*)
│   │   │   │   ├── tokio-util v0.7.10
│   │   │   │   │   ├── tokio v1.36.0 (*)
│   │   │   │   ├── tokio v1.36.0 (*)
│   │   │   │   ├── tokio v1.36.0 (*)
│   │   │   │   └── tokio-rustls v0.24.1
│   │   │   │       └── tokio v1.36.0 (*)
│   │   │   ├── tokio v1.36.0 (*)
│   │   │   ├── tokio-rustls v0.24.1 (*)
│   │   │   ├── tokio-util v0.7.10 (*)
│   ├── tokio v1.36.0 (*)
├── tokio v1.36.0 (*)

Platform

Linux xuanwo-work 6.7.3-zen1-1-zen #1 ZEN SMP PREEMPT_DYNAMIC Thu, 01 Feb 2024 10:30:25 +0000 x86_64 GNU/Linux

Description

While addressing https://github.com/apache/opendal/issues/4058, we discovered that retrying File::flush while disk is full could result in data loss.

To reproduce:

  1. Setup a small fs by:
fallocate -l 512K disk.img
mkfs disk.img

mkdir /tmp/test_dir
sudo mount -o loop disk.img /tmp/test_dir

sudo chmod a+wr /tmp/test_dir

Now we have a fs that only have 512K.

  1. Running the follwing code:
use std::env;
use rand::prelude::*;
use tokio::io::AsyncWriteExt;
use tracing_subscriber;

#[tokio::main]
async fn main() -> Result<()> {
    tracing_subscriber::fmt::init();

    let path = &env::var("OPENDAL_FS_ROOT").expect("root must be set for this test");

    let mut f = tokio::fs::OpenOptions::new()
        .create(true)
        .write(true)
        .open(format!("{path}/test"))
        .await
        .unwrap();

    let size = thread_rng().gen_range(512 * 1024 + 1..4 * 1024 * 1024);
    let mut bs = vec![0; size];
    thread_rng().fill_bytes(&mut bs);

    f.write(&bs).await.unwrap();

    let res = f.flush().await;
    dbg!(&res);

    // After some operations, we retry the file flush.
    let res = f.flush().await;
    dbg!(&res);

    Ok(())
}

The full code example code be found at https://github.com/apache/opendal/pull/4141. I remove the opendal related code to make this example more readable.

The output is:

    Finished dev [unoptimized + debuginfo] target(s) in 0.79s
     Running `/home/xuanwo/Code/apache/opendal/core/target/debug/edge_file_close_with_retry_on_full_disk`
[edge/file_close_with_retry_on_full_disk/src/main.rs:48] &res = Err(
    Os {
        code: 28,
        kind: StorageFull,
        message: "No space left on device",
    },
)
[edge/file_close_with_retry_on_full_disk/src/main.rs:52] &res = Ok(
    (),
)

The first time, flush generates StorageFull which is expeceted. But the second time, the same flush call returns Ok.

I expected to see a StorageFull error instead.

The key problem here is:


Based on the code here:

https://github.com/tokio-rs/tokio/blob/63caced26f07240fa2751cefccee86cc342d3581/tokio/src/fs/file.rs#L887-L906

Maybe we should:

I'm willing to give it a fix.

carllerche commented 9 months ago

I'm a bit confused. I'm less familiar w/ the details of FS ops.

The key problem here is:

  • Users can't recover from this error, even if they try removing other files. The previous write operation returned Ok.
  • Retrying the flush operation is permitted but risky. Data written may be lost forever once flush returns Ok.

The key problem w/ Tokio's impl or getting an err when calling flush in general?

Maybe you could explain how to handle StorageFull and flush w/ blocking std calls and where converting that blocking code to Tokio's fs api fails.

Xuanwo commented 9 months ago

I believe it's more like an issue of Tokio's implementation, which doesn't pass the write error on to the user.

I believe it's more like an issue of Tokio's implementation, which doesn't pass the write error on to the user.

Maybe you could explain how to handle StorageFull and flush w/ blocking std calls and where converting that blocking code to Tokio's fs api fails.

There is no direct mapping from std::File::flush to tokio::File::flush. std::File::flush on linux is a no-op, while tokio::File::flush involves to it's internal buffer logic.


I prepared a full repro here: https://github.com/Xuanwo/tokio-issue-6325-storage-full

let n = f.write(&bs).await?;
dbg!(&n);
assert_eq!(n, size, "tokio file always write data into buffer first");

While we calling write on a file, tokio will store it inside buf directly. After flush returns the write error, we cleaned it up and won't write again.

The same repro doesn't work on std::fs since std::fs will return the correct write size in f.write(). User will got the error while trying to write more data.

Xuanwo commented 9 months ago

During implement https://github.com/tokio-rs/tokio/pull/6330, I found that tokio will clear the buffer while error happened during write.

https://github.com/tokio-rs/tokio/blob/28d88cc35902a0489e9b0781a47f9f5713a01673/tokio/src/io/blocking.rs#L258-L265

I'm guessing we need to maintain the internal states here instead of droping all data?