skerkour / kerkour.com

(Ab)using technology for fun & profit. Programming, Hacking & Entrepreneurship @ https://kerkour.com
https://kerkour.com
Apache License 2.0
463 stars 63 forks source link

Bad assumption on read() results #8

Open ayende opened 2 years ago

ayende commented 2 years ago

In https://github.com/skerkour/kerkour.com/blob/main/2022/rust_file_encryption_with_password/src/main.rs#L75 (and in many other places), you are assumption that the only scenario where you'll get less than the number of requested bytes is when you are at the end of the file. This is wrong.

See the docs here:

https://doc.rust-lang.org/std/io/trait.Read.html#tymethod.read

In particular:

It is not an error if the returned value n is smaller than the buffer size, even when the reader is not at the end of the stream yet.

There are many reasons why this may be the case.

A great example is if the read you are doing happened to start on a region that is already in memory, but continues to a page that is not there yet. The file system will return what it has right now. That can lead to really hard to figure out bugs.

sylvain101010 commented 2 years ago

Hey @ayende,

It is not an error if the returned value n is smaller than the buffer size, even when the reader is not at the end of the stream yet.

Indeed i'm aware of this warning but in my experience when reading a local file, read never returned less than BUFF_LEN,

Because the docs also say:

While for File, it is possible to reach the end of file and get zero as result

This function does not provide any guarantees about whether it blocks waiting for data, but if an object needs to block for a read and cannot, it will typically signal this via an Err return value.

So I believe it should be safe to assume that when reading a local file read will prefer to block rather than return a misleading result.

But, you also state:

A great example is if the read you are doing happened to start on a region that is already in memory, but continues to a page that is not there yet.

I'm learning this today, Thank you very much. But I also find that very surprising!

Do you have a recommendation to fix this potential issue? Use a BufReader?

ayende commented 2 years ago

You need to keep reading until read() returns a zero, that is the only scenario where you can be sure that the file is done.

Something like this:

 loop {
        let read_count = source_file.read(&mut buffer)?;

        if read_count == 0{
               break;
        } else {
            let ciphertext = stream_encryptor
                .encrypt_last(&buffer[..read_count])
                .map_err(|err| anyhow!("Encrypting large file: {}", err))?;
            dist_file.write(&ciphertext)?;
        }
    }

That work for all scenarios.

ayende commented 2 years ago

I noticed that you have encrypt_next vs. encrypt_last, you should probably just call encrypt_next always, and pass encrypt_last with empty buffer.

sylvain101010 commented 2 years ago

Thnaks! The problem is that it would make the encryption/decryption way more complex: we need to operate on the same chunks when decrypting than when encrypting.

Rust's std File.read uses directly the read of the OS's libc (https://doc.rust-lang.org/src/std/fs.rs.html#618, https://github.com/rust-lang/rust/blob/master/library/std/src/sys/unix/fs.rs#L852).

Where can I find information about the behavior you have described above (that File.read can return less than the requested number of bytes for a local file, so far I can only find information stating that read can return less than the requested number of bytes when reaching end of file)?

ayende commented 2 years ago

If you need to make the distinction between next & last, read to the buffer until it is full, even if you get partial read. And only stop if you get 0 back.

For reference, see: It is not an error if this number is smaller than the number of bytes requested

https://man7.org/linux/man-pages/man2/read.2.html