rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.96k stars 12.53k forks source link

BufReader is broken #81139

Closed Yuri6037 closed 9 months ago

Yuri6037 commented 3 years ago

I have tried to read multiple 20 bytes blocks using a BufReader connected to a File::open. For some reasons BufReader will finish by reading fewer bytes than asked. Removing the BufReader and issuing direct calls to read from the file fixes the problem. Is this expected behavior?

EDIT: After some more testing it seem to happen only on files exceeding 1Kb

Aaron1011 commented 3 years ago

@Yuri6037 Can you post the code you're using?

Yuri6037 commented 3 years ago

The code is pretty big but I'll try to point you to the part that is problematic:

https://gitlab.com/BlockProject3D/eitools/-/blob/master/BPMToOBJ/src/bpm.rs

This link points you to the file where I'm reading data. You can enable the buffered reader on line 176. You will observe that when loading a file with the buffered reader the data is corrupted, however without the data is fine. If you need I can upload a small sphere bpm.

SkiFire13 commented 3 years ago

At line 58 you're using Read::read to read from the reader, however this isn't guaranteed to fill the buffer and will instead return how many bytes it actually read. Use Read::read_exact if you want to completly fill the buffer.

Yuri6037 commented 3 years ago

What is the use case for read then ? If it's not guaranteed to fill the buffer like the system read then why does it even exist?

EDIT: I just read doc for read_exact: I do not want an error on EOF I just want the behavior of the standard system read which is: when EOF return 0 or the number of bytes that could be read. When not EOF return full buffer. So in fact what I need is the exact same behavior as the current read in File but for a buffer.

EDIT2: I think I will just write my own BufferedReader which will fix that defect.

sfackler commented 3 years ago

I just want the behavior of the standard system read which is

That is not the behavior of the system for many things, including TCP sockets.

SkiFire13 commented 3 years ago

Instead of writing your own BufferedReader why not propose to add a new method to the Read trait for your usecase? It doesn't feel that niche to me. For example what about:

trait Read {
    /* Other methods */

    // TODO: Find a better name?
    fn read_fill(&mut self, mut buf: &mut [u8]) -> Result<usize> {
        let mut written = 0;
        loop {
            let len = match self.read(buf) {
                Ok(0) => return Ok(written),
                Ok(len) => len,
                Err(ref e) if e.kind() == ErrorKind::Interrupted => continue,
                Err(e) => return Err(e),
            };
            written += len;
            buf = &mut buf[len..];
        }
    }
}

This is pretty much equivalent to std::io::copy(&mut reader, &mut buf), except it avoids an useless copy. Maybe we could also specialize it to avoid using an additional buffer when W is &'_ mut [u8]/Cursor<&'_ mut [u8]>/Cursor<Box<[u8]>> and stuff like that?

Yuri6037 commented 3 years ago

I don't know for now I will just write my own BufferedReader so that I don't have to update the read in all of my projects. Ideally I'd like some way to activate a flag in the existing BufReader so that the read behaves like the underlying File.read. However I don't think this kind of change will be accepted...

EDIT: Maybe I can propose a new BufReader with a read function that behaves like the underlying File.read and upload this to crates.io. This could be a nice move. I'll see depending on time and complexity of uploading to crates.io. I'm not even sure if I can do that as I have no money...

EDIT2: I just found https://doc.rust-lang.org/cargo/reference/publishing.html. I may consider publishing my custom buffered reader to crates.io. As you said maybe some more people would be interested in that.

EDIT3: It's uploaded it worked much much faster than any other package manager before! Even NPM is much more painful than Cargo on this one. Here is a link to the official crates.io page: https://crates.io/crates/bufferedreader

Yuri6037 commented 3 years ago

That is not the behavior of the system for many things, including TCP sockets.

I know that. I was speaking about the behavior of read/ReadFile on files not network streams. I expect that the buffer should not alter the behavior of the underlying sys-calls. In the case of a File it clearly has a different behavior as explained in this post.

Anyway, the issue is now completely resolved thanks to a custom implementation of a BufferedReader. I even have published it as a crate so anyone can use it if they fall under the same issue as I did. For information I have tested this new buffered reader on a file with over 20000 vertices. It took quite a long time but absolutely no corruptions, I can finally decompile my own format back into an OBJ file to use in Blender minus a coordinate system issue not that big of a deal for now.

SkiFire13 commented 3 years ago

I expect that the buffer should not alter the behavior of the underlying sys-calls. In the case of a File it clearly has a different behavior as explained in this post.

The buffer altered the observable behaviour, however the behaviour you're speaking of is implementation details. You shouldn't expect them to be the same when changing the implementation...

Yuri6037 commented 3 years ago

I expect that the buffer should not alter the behavior of the underlying sys-calls. In the case of a File it clearly has a different behavior as explained in this post.

The buffer altered the observable behaviour, however the behaviour you're speaking of is implementation details. You shouldn't expect them to be the same when changing the implementation...

I know it is implementation defined but the matter is this behavior is standard on Windows Mac Linux, Free BSD and even Android. And even if this wouldn't be defined on all platforms I still expect a file buffered reader to read more when not enough are available until no more data can be read... When it comes to networking I usually prefer to control buffering on my own especially with UDP.

SkiFire13 commented 3 years ago

I still expect a file buffered reader

But BufReader is not file specific, it's generic. Does it have to preserve only File's implementation details? Or any reader's implementation details?

Yuri6037 commented 3 years ago

I still expect a file buffered reader

But BufReader is not file specific, it's generic. Does it have to preserve only File's implementation details? Or any reader's implementation details?

In general only for files. But after all the behavior that I expect from my buffer is just to retry loading more data if the buffer is not full. If it can't read more cause the underlying syscall said no in that case only should it stop trying.

Yuri6037 commented 2 years ago

I'm posting here again just to add the exact function I need but this time generalized to any io::Read. The basic idea is in 100% of my IO cases I need to read as many bytes as possible. I don't need to skip certain errors at the countrary, I want to keep all possible types of errors (especially with network streams) but I also want to fill as much as possible the buffer.

So here the function does:

use std::io::Result;

trait ReadFill
{
    fn read_fill(&mut self, buf: &mut [u8]) -> Result<usize>;
}

impl<T: std::io::Read> ReadFill for T
{
    fn read_fill(&mut self, buf: &mut [u8]) -> Result<usize>
    {
        let mut bytes = 0;
        let mut len = self.read(buf)?;
        bytes += len;
        while len > 0 && buf.len() - len > 0 {
            len = self.read(&mut buf[len..])?;
            bytes += len;
        }
        return Ok(bytes);
    }
}
Enselic commented 9 months ago

Anyway, the issue is now completely resolved thanks to a custom implementation of a BufferedReader.

Triage: Thank you for the update! Let's close.