Open kentfredric opened 5 years ago
File::open
works on directories because the underlying system call open(2)
works on directories. You can use O_DIRECTORY
to trigger an error if the path is not a directory, but I don't know if there's a way of doing the opposite.
Your program is going crazy because you're collecting a vector of io::Error
s.
Perhaps, BufRead et. al needs to internally track if the last call returned EISDIR, and then return None() afterwards?
Trying to continue calling read() after EISDIR is kinda nonsensical.
( That is, EISDIR should be considered equivalent to "Error + EOF" , because a subsequent read() cannot return any more data )
I'm not sure we want to try to catalog the list of all possible terminal errors on every platform.
Are there platforms where invoking read() on a filehandle opened for a directory does something useful?
There may be other function calls useful on directories opened as filehandles, but read()?
Yes. NetBSD supports reading directory entries this way.
As far as I can divine, there's no useful behaviour to be found by allowing File::open to work on directories
On linux there are many uses. It gives you a directory file descriptor (accessible via AsRawFd) which can be used for various other syscalls (not necessarily in the standard library) such as fiemap or the *at syscall family such as renameat.
And the residual question is "How does one invoke File::open in such a way that it refuses to work on directories".
Get the metadata of the file after opening it and check its type. This is often necessary anyway if you only want to operate on regular files and not block devices for example.
Re-categorizing as a documentation issue. File::open
should mention that it (might) work on directories.
At the uutils project we just ran into this issue although triggered in a different way. Basically we have a couple instances of this pattern:
reader.lines().filter_map(Result::ok)
This seems quite innocent, but it causes an infinite loop. The correct code would be:
reader.lines().map_while(Result::ok)
I think documentation is not really enough in this case, because it's an easy mistake to make if the programmer is just looking at the types; filter_map(Result::ok)
is the "obvious solution" to turn Iterator<Item = Result<T,E>>
into Iterator<Item = T>
in general.
Perhaps, BufRead et. al needs to internally track if the last call returned EISDIR, and then return None() afterwards?
This seems like a good solution to me. Though I think it would be more consistent if it returned None
after any error. I wonder if anyone relies on the current behaviour? Are there plausible use cases of Lines
where it might return an error for a while and then then continues with Ok
? Or where the error changes on subsequent calls? In any case it would still technically be a breaking change, so I'd like to suggest at least a clippy lint for filter_map(Result::ok)
on Lines
, along with better documentation.
Relevant issues: https://github.com/uutils/coreutils/issues/4539, https://github.com/uutils/coreutils/issues/4573, https://github.com/uutils/coreutils/issues/4574
@samueltardieu maybe it could be add as a clippy check ^^ (see @tertsdiepraam comment)
I just found that the case I presented is essentially a duplicate of https://github.com/rust-lang/rust/issues/34703 and https://github.com/rust-lang/rust/issues/43504, which were closed with a reason of "this is expected behavior, you're misusing the API". However, I think that the mistake is too subtle and to easy to make to put the blame entirely on the programmer. Additionally, the fact that this problem showed up in uutils shows that it can cause problems in real code, written by competent Rustaceans.
Edit: For a bit less anecdotal evidence, a search on GitHub also brings up many examples of the same pattern: https://github.com/search?q=.lines()+.filter_map(Result%3A%3Aok)&type=code&ref=advsearch
@samueltardieu maybe it could be add as a clippy check ^^ (see @tertsdiepraam comment)
Good idea: https://github.com/rust-lang/rust-clippy/pull/10534
@sylvestre @tertsdiepraam: https://rust-lang.github.io/rust-clippy/master/index.html#lines_filter_map_ok
@samueltardieu That looks excellent to me. Thank you!
Hello,
Sorry to dig up this issue but it is referenced and for anyone that may fall into it.
If you do want to get the BufReader for instance to count lines like this on a directory :
use std::io::{BufRead,Read};
fn getsizeoffile<T: AsRef<std::path::Path>>(path: T) -> std::io::Result<(usize,usize)> {
let size2 = std::fs::read_to_string(&path).unwrap_or(String::new()).lines().count();
let lines = std::io::BufReader::new(File::open(&path)?).take(1000);
let size = lines.lines().count(); //WILL CRASH
Ok((size,size2))
}
#[test]
fn testsize() {
let (size0, size1) = getsizeoffile("/").unwrap();
assert_eq!(size0,size1);
}
The first std::fs::read_to_string(&path).unwrap_or(String::new()).lines().count();
would unwrap with IsAdirectory however the second bufreader will consume all CPU.
To avoid that, you need to change the line to:
use std::io::{BufRead,Read};
fn getsizeoffile<T: AsRef<std::path::Path>>(path: T) -> std::io::Result<(usize,usize)> {
let size2 = std::fs::read_to_string(&path).unwrap_or(String::new()).lines().count();
let lines = std::io::BufReader::new(File::open(&path)?).take(1000);
let size = lines.lines().map_while(Result::ok).count(); //Count only if lines is okay (else count lines till there)
Ok((size,size2))
}
#[test]
fn testsize() {
let (size0, size1) = getsizeoffile("/").unwrap();
assert_eq!(size0,size1);
}
It is already stated in the above discussion. As well the take limit does nothing here because it does not read anything.
I am here after troubleshooting a bug that assumed that opening a directory is cross-platform compatible, but in-fact works on Unix platforms but not on Windows. See here: https://github.com/paradigmxyz/reth/issues/9747
I think this is the fundamental problem i.e. the behavior is platform specific. Yet this is not documented clearly.
I would appreciate if someone corrects me and points me to documentation on this. The only place where I found a clear description of the platform differences was ChatGPT.
Opening a directory is possible on windows, but you need to pass an extra flag.
The std::fs APIs are rather thin abstractions over platform APIs, so you'll have to consult the platform's documentation for the low level details. The behavior of special files (such as directories, pipes and others) are platform-specific.
We do sometimes do things to abstract over platforms differences. Though for this change particularly it's arguable which is the "right" default behaviour and changing it on either platform may be unexpected.
I think a way forward might be to add an option to OpenOptions
which can override platform defaults.
I tripped onto some odd behaviour.
I was under the assumption calling
File::open()
on a directory would Err(), and that I could use that to handle a user specifying a path that was not a file, ( instead of falling prey to race conditions by stat-ing first and then opening second ).However, ... bad things happened instead.
The last of these lines will run forever, with strace reporting:
Over and over again ad-infinitum with no end in sight.
Somehow I had a variation of this go crazy and eat 200% of my ram, but I'm having a hard time reproducing that exact case (Though it may have been related to the target-directory in question also being massive in my case).
Digging shows related bug #43504
The profound question that remains unanswered is: "Why is calling File::open on a directory fine?"
And the residual question is "How does one invoke File::open in such a way that it refuses to work on directories".
And importantly, how do I achieve that portably?
And if none of these concerns can be mitigated, the very least that could be done is have this giant foot-gun documented somewhere in
File::open
.( As far as I can divine, there's no useful behaviour to be found by allowing File::open to work on directories, as standard read() semantics simply don't work on directory filehandles, you have to use readdir() )