Open Xuanwo opened 10 months ago
I'm a bit confused. I'm less familiar w/ the details of FS ops.
The key problem here is:
- Users can't recover from this error, even if they try removing other files. The previous write operation returned Ok.
- Retrying the flush operation is permitted but risky. Data written may be lost forever once flush returns Ok.
The key problem w/ Tokio's impl or getting an err when calling flush in general?
Maybe you could explain how to handle StorageFull
and flush
w/ blocking std
calls and where converting that blocking code to Tokio's fs
api fails.
I believe it's more like an issue of Tokio's implementation, which doesn't pass the write error on to the user.
I believe it's more like an issue of Tokio's implementation, which doesn't pass the write error on to the user.
Maybe you could explain how to handle
StorageFull
andflush
w/ blockingstd
calls and where converting that blocking code to Tokio'sfs
api fails.
There is no direct mapping from std::File::flush
to tokio::File::flush
. std::File::flush
on linux is a no-op, while tokio::File::flush
involves to it's internal buffer logic.
I prepared a full repro here: https://github.com/Xuanwo/tokio-issue-6325-storage-full
let n = f.write(&bs).await?;
dbg!(&n);
assert_eq!(n, size, "tokio file always write data into buffer first");
While we calling write
on a file, tokio will store it inside buf directly. After flush
returns the write error, we cleaned it up and won't write again.
The same repro doesn't work on std::fs
since std::fs
will return the correct write size in f.write()
. User will got the error while trying to write more data.
During implement https://github.com/tokio-rs/tokio/pull/6330, I found that tokio will clear the buffer while error happened during write.
I'm guessing we need to maintain the internal states here instead of droping all data?
Version
Platform
Description
While addressing https://github.com/apache/opendal/issues/4058, we discovered that retrying
File::flush
while disk is full could result in data loss.To reproduce:
Now we have a fs that only have 512K.
The full code example code be found at https://github.com/apache/opendal/pull/4141. I remove the opendal related code to make this example more readable.
The output is:
The first time,
flush
generatesStorageFull
which is expeceted. But the second time, the same flush call returnsOk
.I expected to see a
StorageFull
error instead.The key problem here is:
Ok
.Ok
.Based on the code here:
https://github.com/tokio-rs/tokio/blob/63caced26f07240fa2751cefccee86cc342d3581/tokio/src/fs/file.rs#L887-L906
Maybe we should:
buf
?last_write_err
if the write operation failed?I'm willing to give it a fix.