rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
97.66k stars 12.63k forks source link

File::open() on directories does not return Err(), leads to breakage with BufReader #64144

Open kentfredric opened 5 years ago

kentfredric commented 5 years ago

I tripped onto some odd behaviour.

I was under the assumption calling File::open() on a directory would Err(), and that I could use that to handle a user specifying a path that was not a file, ( instead of falling prey to race conditions by stat-ing first and then opening second ).

However, ... bad things happened instead.

use std::path::PathBuf;
use std::fs::File;
use std::io::BufRead;
use std::io::BufReader;

fn main() {
    let file = File::open(PathBuf::from("/tmp")).unwrap();
    let buf  = BufReader::with_capacity(100, file);
    let _lines: Vec<std::io::Result<String>> = buf.lines().collect();
}

The last of these lines will run forever, with strace reporting:

read(3, 0x564803253b80, 100)            = -1 EISDIR (Is a directory)

Over and over again ad-infinitum with no end in sight.

Somehow I had a variation of this go crazy and eat 200% of my ram, but I'm having a hard time reproducing that exact case (Though it may have been related to the target-directory in question also being massive in my case).

Digging shows related bug #43504

The profound question that remains unanswered is: "Why is calling File::open on a directory fine?"

And the residual question is "How does one invoke File::open in such a way that it refuses to work on directories".

And importantly, how do I achieve that portably?

And if none of these concerns can be mitigated, the very least that could be done is have this giant foot-gun documented somewhere in File::open.

( As far as I can divine, there's no useful behaviour to be found by allowing File::open to work on directories, as standard read() semantics simply don't work on directory filehandles, you have to use readdir() )

sfackler commented 5 years ago

File::open works on directories because the underlying system call open(2) works on directories. You can use O_DIRECTORY to trigger an error if the path is not a directory, but I don't know if there's a way of doing the opposite.

Your program is going crazy because you're collecting a vector of io::Errors.

kentfredric commented 5 years ago

Perhaps, BufRead et. al needs to internally track if the last call returned EISDIR, and then return None() afterwards?

Trying to continue calling read() after EISDIR is kinda nonsensical.

kentfredric commented 5 years ago

( That is, EISDIR should be considered equivalent to "Error + EOF" , because a subsequent read() cannot return any more data )

sfackler commented 5 years ago

I'm not sure we want to try to catalog the list of all possible terminal errors on every platform.

kentfredric commented 5 years ago

Are there platforms where invoking read() on a filehandle opened for a directory does something useful?

There may be other function calls useful on directories opened as filehandles, but read()?

mgorny commented 5 years ago

Yes. NetBSD supports reading directory entries this way.

the8472 commented 5 years ago

As far as I can divine, there's no useful behaviour to be found by allowing File::open to work on directories

On linux there are many uses. It gives you a directory file descriptor (accessible via AsRawFd) which can be used for various other syscalls (not necessarily in the standard library) such as fiemap or the *at syscall family such as renameat.

And the residual question is "How does one invoke File::open in such a way that it refuses to work on directories".

Get the metadata of the file after opening it and check its type. This is often necessary anyway if you only want to operate on regular files and not block devices for example.

jonas-schievink commented 5 years ago

Re-categorizing as a documentation issue. File::open should mention that it (might) work on directories.

tertsdiepraam commented 1 year ago

At the uutils project we just ran into this issue although triggered in a different way. Basically we have a couple instances of this pattern:

reader.lines().filter_map(Result::ok)

This seems quite innocent, but it causes an infinite loop. The correct code would be:

reader.lines().map_while(Result::ok)

I think documentation is not really enough in this case, because it's an easy mistake to make if the programmer is just looking at the types; filter_map(Result::ok) is the "obvious solution" to turn Iterator<Item = Result<T,E>> into Iterator<Item = T> in general.

Perhaps, BufRead et. al needs to internally track if the last call returned EISDIR, and then return None() afterwards?

This seems like a good solution to me. Though I think it would be more consistent if it returned None after any error. I wonder if anyone relies on the current behaviour? Are there plausible use cases of Lines where it might return an error for a while and then then continues with Ok? Or where the error changes on subsequent calls? In any case it would still technically be a breaking change, so I'd like to suggest at least a clippy lint for filter_map(Result::ok) on Lines, along with better documentation.

Relevant issues: https://github.com/uutils/coreutils/issues/4539, https://github.com/uutils/coreutils/issues/4573, https://github.com/uutils/coreutils/issues/4574

sylvestre commented 1 year ago

@samueltardieu maybe it could be add as a clippy check ^^ (see @tertsdiepraam comment)

tertsdiepraam commented 1 year ago

I just found that the case I presented is essentially a duplicate of https://github.com/rust-lang/rust/issues/34703 and https://github.com/rust-lang/rust/issues/43504, which were closed with a reason of "this is expected behavior, you're misusing the API". However, I think that the mistake is too subtle and to easy to make to put the blame entirely on the programmer. Additionally, the fact that this problem showed up in uutils shows that it can cause problems in real code, written by competent Rustaceans.

Edit: For a bit less anecdotal evidence, a search on GitHub also brings up many examples of the same pattern: https://github.com/search?q=.lines()+.filter_map(Result%3A%3Aok)&type=code&ref=advsearch

samueltardieu commented 1 year ago

@samueltardieu maybe it could be add as a clippy check ^^ (see @tertsdiepraam comment)

Good idea: https://github.com/rust-lang/rust-clippy/pull/10534

samueltardieu commented 1 year ago

@sylvestre @tertsdiepraam: https://rust-lang.github.io/rust-clippy/master/index.html#lines_filter_map_ok

tertsdiepraam commented 1 year ago

@samueltardieu That looks excellent to me. Thank you!

DorianCoding commented 4 months ago

Hello,

Sorry to dig up this issue but it is referenced and for anyone that may fall into it.

If you do want to get the BufReader for instance to count lines like this on a directory :

use std::io::{BufRead,Read};
fn getsizeoffile<T: AsRef<std::path::Path>>(path: T) -> std::io::Result<(usize,usize)> {
    let size2 = std::fs::read_to_string(&path).unwrap_or(String::new()).lines().count();
    let lines = std::io::BufReader::new(File::open(&path)?).take(1000);
    let size = lines.lines().count(); //WILL CRASH
    Ok((size,size2))
}
#[test]
fn testsize() {
    let (size0, size1) = getsizeoffile("/").unwrap();
    assert_eq!(size0,size1);
}

The first std::fs::read_to_string(&path).unwrap_or(String::new()).lines().count(); would unwrap with IsAdirectory however the second bufreader will consume all CPU. To avoid that, you need to change the line to:

use std::io::{BufRead,Read};
fn getsizeoffile<T: AsRef<std::path::Path>>(path: T) -> std::io::Result<(usize,usize)> {
    let size2 = std::fs::read_to_string(&path).unwrap_or(String::new()).lines().count();
    let lines = std::io::BufReader::new(File::open(&path)?).take(1000);
    let size = lines.lines().map_while(Result::ok).count(); //Count only if lines is okay (else count lines till there)
    Ok((size,size2))
}
#[test]
fn testsize() {
    let (size0, size1) = getsizeoffile("/").unwrap();
    assert_eq!(size0,size1);
}

It is already stated in the above discussion. As well the take limit does nothing here because it does not read anything.

kaxxa123 commented 1 month ago

I am here after troubleshooting a bug that assumed that opening a directory is cross-platform compatible, but in-fact works on Unix platforms but not on Windows. See here: https://github.com/paradigmxyz/reth/issues/9747

I think this is the fundamental problem i.e. the behavior is platform specific. Yet this is not documented clearly.

I would appreciate if someone corrects me and points me to documentation on this. The only place where I found a clear description of the platform differences was ChatGPT.

the8472 commented 1 month ago

Opening a directory is possible on windows, but you need to pass an extra flag.

The std::fs APIs are rather thin abstractions over platform APIs, so you'll have to consult the platform's documentation for the low level details. The behavior of special files (such as directories, pipes and others) are platform-specific.

ChrisDenton commented 1 month ago

We do sometimes do things to abstract over platforms differences. Though for this change particularly it's arguable which is the "right" default behaviour and changing it on either platform may be unexpected.

I think a way forward might be to add an option to OpenOptions which can override platform defaults.