Open safinaskar opened 2 years ago
I m sorry to say but in fact dirent* struct are in fact correct, I invite you to look at the man pages/headers for confirmation. Note that libc is not a "smart wrapper" neither but reflects what the system libc are. What seems more appropriate in your case is the getdents api usage.
@devnexen consider this code:
// This code for x86_64-unknown-linux-gnu
use std::ffi::CStr;
fn main() {
unsafe {
let dir = libc::opendir(b"/mnt/3\x00" as *const u8 as *const i8);
assert_ne!(dir, std::ptr::null_mut());
loop {
let ent: *mut libc::dirent = libc::readdir(dir);
if ent == std::ptr::null_mut() {
break;
}
let name = CStr::from_ptr((*ent).d_name.as_ptr());
println!("{}", name.to_bytes().len());
println!("{:?}", name);
}
}
}
This code correctly handles file names bigger than 256 bytes. Does it invoke undefined behavior? I. e. is reading past formal struct size undefined behavior?
getdents
getdents is not wrapped by libc crate, nor by glibc. So I will have to manually wrap it, this is error-prone
you can call it via syscall (the syscalls ids are present in libc) and map the linux_dirent* I think.
I still think that we should research whether accessing dirent
past its formal size is undefined behavior (possibly by reading https://rust-lang.github.io/unsafe-code-guidelines ). If yes, then we should fix or remove readdir
function from libc, because it is impossible to use it correctly
or you can copy the result into a sized buffer. I disagree removing readdir honestly.
or you can copy the result into a sized buffer
You mean read result from dirent
to sized buffer? Okey, but is this reading undefined behavior?
Note that unsafe
really means what it means, inside this block, rust safety and guarantees do not apply, it is the responsibility of code user to handle carefully possible side effects, use after free, stack overflow ... just like with language like C.
If anyone wants to chime in the discussion feel free :-)
@devnexen consider this code:
// This code for x86_64-unknown-linux-gnu use std::ffi::CStr; fn main() { unsafe { let dir = libc::opendir(b"/mnt/3\x00" as *const u8 as *const i8); assert_ne!(dir, std::ptr::null_mut()); loop { let ent: *mut libc::dirent = libc::readdir(dir); if ent == std::ptr::null_mut() { break; } let name = CStr::from_ptr((*ent).d_name.as_ptr()); println!("{}", name.to_bytes().len()); println!("{:?}", name); } } }
This code correctly handles file names bigger than 256 bytes. Does it invoke undefined behavior? I. e. is reading past formal struct size undefined behavior?
I don't have all the context to answer this question definitively, but assuming that readdir
actually returned a contiguous allocation with a null-terminated d_name
field of longer than 256 bytes, your code as written is suspicious and possibly UB (whether it is actually UB is rust-lang/unsafe-code-guidelines#134) due to creating a reference at (*ent).d_name.as_ptr()
, but you can rewrite it to be definitely not UB by using addr_of!((*ent).d_name)
instead.
@digama0 , thanks a lot for answer. This is very big gotcha. This means lib::dirent
should be fixed or at very least properly documented
I just found that actual std
implementation of ReadDir
is even more careful. The authors reject addr_of!((*ent).d_name)
as wrong and instead perform offset computations manually. See code starting from this line: https://github.com/rust-lang/rust/blob/42f28f9eb41adb7a197697e5e2d6535d00fd0f4a/library/std/src/sys/unix/fs.rs#L676
@RalfJung Could you comment on the correctness of the comment there? My take is that the reason addr_of!((*ptr).d_name)
is said to be invalid is because the allocation may not actually be 256 bytes (plus the rest of the struct), and (at least for now) addr_of!((*ptr).d_name)
asserts that the entire d_name: [c_char; 256]
field is within an allocation. So an alternative way to fix the code would be to have a version of dirent
with a d_name: [c_char; 0]
VLA and using that to do the offset computation.
Yeah that comment sounds accurate for the current very conservative semantics we have for *ptr
(where the pointer has to be fully aligned and dereferenceable). That comment is also a good argument for why we'd probably want a more relaxed semantics...
It sounds like we can improve things with a breaking change. If anyone has any proposals, feel free to put up a PR for 1.0.
@tgross35 , obviously, perfect solution would be waiting for "thin cstr" and making thin cstr (not reference to thin cstr, but unsized thin cstr itself) last member of dirent
Yeah, I do wish we had that available. Since we don't though, changing to [c_char; 0]
is probably our best bet.
This issue is about the name being longer than 256 bytes. I'd like to add that the name could even be shorter than 256 bytes! I have observed this in practice and this is corroborated by this comment in the std library:
// The dirent64 struct is a weird imaginary thing that isn't ever supposed
// to be worked with by value. Its trailing d_name field is declared
// variously as [c_char; 256] or [c_char; 1] on different systems but
// either way that size is meaningless; only the offset of d_name is
// meaningful. The dirent64 pointers that libc returns from readdir64 are
// allowed to point to allocations smaller _or_ LARGER than implied by the
// definition of the struct.
I noticed this, because the following code was crashing:
unsafe { *self.entry }.d_type
where self.entry
is libc::dirent64
or libc::dirent
. At least when compiling in debugging mode, the unsafe
block copies the whole structure, and then accesses d_type
. The memcpy
can result in a seg fault if the direct
data structure is smaller, at the end of a page, and the following page is not mapped:
Thread 7 "store::certd::t" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 10589]
memcpy () at src/string/x86_64/memcpy.s:18
warning: 18 src/string/x86_64/memcpy.s: No such file or directory
(gdb) up
#1 0x000055f054f18d34 in openpgp_cert_d::unixdir::DirEntry::file_type (self=0x7fa822fdb780)
at src/unixdir.rs:144
144 FileType(unsafe { *self.entry }.d_type)
Making d_name
a [c_char; 0]
(or even a [c_char; 1]
since the name is NUL terminated) would prevent this issue, I think.
Yeah by wrapping *self.entry
in curly braces you are forcing that value to be loaded and moved. I would recommend you use unsafe { *self.entry.d_type }
to avoid such an unnecessary load + move.
Yeah by wrapping
*self.entry
in curly braces you are forcing that value to be loaded and moved. I would recommend you useunsafe { *self.entry.d_type }
to avoid such an unnecessary load + move.
Thanks a lot for the suggestion!
For the record, unsafe { *self.entry.d_type }
doesn't work, but this does:
unsafe { (&*self.entry).d_type }
unsafe { (*self.entry).d_type }
should also work.
Indeed it does. Thanks!
Your structs
dirent
,dirent64
are wrong at x86_64-unknown-linux-gnu. Their fieldd_name
contains 256 bytes. But names can be bigger than 256 bytes in NTFS (in Linux). I was able to create NTFS partition in my Linux and then create file with name "ыыыы..." (250 times "ы", this is Cyrillic letter). Such name contains more than 256 bytes (if you have difficulties with reproducing, I can provide more details).So, fix structs. I am not sure how exactly fix should be implemented. Maybe
[c_char; 512]
, but in this case we need to double check what is true upper limit. Maybe we can simply put[c_char; 1]
or[c_char; 0]
. Of course, it would be perfect to makedirent
unsized, but with machine word sized reference. Unfortunately, as well as I know, such unsized structs are not possible in stable Rust