zboxfs / zbox

Zero-details, privacy-focused in-app file system.
https://zbox.io/fs/
Apache License 2.0
1.53k stars 74 forks source link

New versions are merging old and new file contents #40

Closed gameldar closed 5 years ago

gameldar commented 5 years ago

I've tested this with 0.7.1, 0.8.1 and current master and it is happening in all three cases.

The problem is when you write new versions of the file that are shorter than the previous versions you end up with merged content where the beginning of the file content is overwritten but the old remains.

The following is a simple replication case:

use std::io::{Read, Write};
use zbox::{init_env, Error, OpenOptions, RepoOpener};

fn main() -> Result<(), Error> {
    init_env();
    let mut repo = RepoOpener::new()
        .create(true)
        .open("file://./my_repo", "your password")?;

    let mut f = OpenOptions::new()
        .create(true)
        .open(&mut repo, "/myfile.txt")?;
    let mut pdf = std::fs::File::open("mylargefile.txt").unwrap();
    let mut buffer = Vec::new();
    pdf.read_to_end(&mut buffer)?;
    f.write_all(&buffer)?;
    f.finish()?;
    println!(
        "Long: version={} - size={}",
        f.curr_version()?,
        f.metadata()?.content_len()
    );

    let mut f = OpenOptions::new()
        .create(true)
        .open(&mut repo, "/myfile.txt")?;
    let mut pdf = std::fs::File::open("myfile.txt").unwrap();
    let mut buffer = Vec::new();
    pdf.read_to_end(&mut buffer)?;
    f.write_all(&buffer)?;
    f.finish()?;

    println!(
        "Short: version={} - size={}",
        f.curr_version()?,
        f.metadata()?.content_len()
    );

    let mut f = OpenOptions::new()
        .create(false)
        .open(&mut repo, "/myfile.txt")?;
    let mut out = std::fs::File::create("myfile.out").unwrap();
    let mut out_buf = Vec::new();
    f.read_to_end(&mut out_buf)?;
    out.write_all(&out_buf).unwrap();
    println!(
        "Reading: version={} - size={}",
        f.curr_version()?,
        f.metadata()?.content_len()
    );
    Ok(())
}

If you create two files: myfile.txt and mylargefile.txt where mylargefile.txt has more lines than myfile you'll see the resulting merged output in myfile.out. (Note I am using text files in this example but I originally saw this when I was trying to overwrite a pdf file with a text file)

The output also shows that the size between versions isn't changing:

Long: version=2 - size=37
Short: version=3 - size=37
Reading: version=3 - size=37

Looking through the source I couldn't see the exact cause - but I suspect the issue is around the merging in the Writer finish implementation.

burmecia commented 5 years ago

@gameldar thank you very much for this issue.

This is an expected result. What happened is that the 2nd short file write was overwriting the file on top of previous version.

For example, suppose we have a long buffer [1, 2, 3, 4, 5] and a short buffer [6, 7, 8]. If we write the long buffer we got the 1st version of content:

version 1: [1, 2, 3, 4, 5]

Then we write the short buffer, we are actually writing on top of the version 1 file from the beginning, not the original empty file, so we got:

version 2: [6, 7, 8, 4, 5]

That's what we're expecting. It is exactly the same behavior if you use std::fs to write a file.

fn main() {
    let buf = [1, 2, 3, 4, 5];
    let buf2 = [6, 7, 8];

    let mut f = std::fs::OpenOptions::new()
        .write(true)
        .create(true)
        .open("tt.bin")
        .unwrap();
    f.write_all(&buf).unwrap();

    let mut f = std::fs::OpenOptions::new()
        .write(true)
        .create(true)
        .open("tt.bin")
        .unwrap();
    f.write_all(&buf2).unwrap();
 }

The above code will produce the file:

$ hexdump ./tt.bin
0000000 06 07 08 04 05

If you want to write a fresh content to the file, you can consider add truncate option, which will clear the content before writing.

// at the 2nd short file write
let mut f = OpenOptions::new()
        .create(true)
        .truncate(true)
        .open(&mut repo, "/myfile.txt")?;
gameldar commented 5 years ago

Ahh yes I see now. Most of the time in the past (in other languages) I've inherently passed through the truncate flag without realising it (e.g. it is the default behaviour).

But the implementation here is exactly what I want to use in another use case as well and using the truncate solves my immediate issue.

Thanks for the quick reply!