rust-lang / libc

Raw bindings to platform APIs for Rust
https://docs.rs/libc
Apache License 2.0
2.06k stars 1.03k forks source link

Wrong structs dirent, dirent64 #2669

Open safinaskar opened 2 years ago

safinaskar commented 2 years ago

Your structs dirent, dirent64 are wrong at x86_64-unknown-linux-gnu. Their field d_name contains 256 bytes. But names can be bigger than 256 bytes in NTFS (in Linux). I was able to create NTFS partition in my Linux and then create file with name "ыыыы..." (250 times "ы", this is Cyrillic letter). Such name contains more than 256 bytes (if you have difficulties with reproducing, I can provide more details).

So, fix structs. I am not sure how exactly fix should be implemented. Maybe [c_char; 512], but in this case we need to double check what is true upper limit. Maybe we can simply put [c_char; 1] or [c_char; 0]. Of course, it would be perfect to make dirent unsized, but with machine word sized reference. Unfortunately, as well as I know, such unsized structs are not possible in stable Rust

devnexen commented 2 years ago

I m sorry to say but in fact dirent* struct are in fact correct, I invite you to look at the man pages/headers for confirmation. Note that libc is not a "smart wrapper" neither but reflects what the system libc are. What seems more appropriate in your case is the getdents api usage.

safinaskar commented 2 years ago

@devnexen consider this code:

// This code for x86_64-unknown-linux-gnu
use std::ffi::CStr;
fn main() {
    unsafe {
        let dir = libc::opendir(b"/mnt/3\x00" as *const u8 as *const i8);
        assert_ne!(dir, std::ptr::null_mut());
        loop {
            let ent: *mut libc::dirent = libc::readdir(dir);
            if ent == std::ptr::null_mut() {
                break;
            }
            let name = CStr::from_ptr((*ent).d_name.as_ptr());
            println!("{}", name.to_bytes().len());
            println!("{:?}", name);
        }
    }
}

This code correctly handles file names bigger than 256 bytes. Does it invoke undefined behavior? I. e. is reading past formal struct size undefined behavior?

safinaskar commented 2 years ago

getdents

getdents is not wrapped by libc crate, nor by glibc. So I will have to manually wrap it, this is error-prone

devnexen commented 2 years ago

you can call it via syscall (the syscalls ids are present in libc) and map the linux_dirent* I think.

safinaskar commented 2 years ago

I still think that we should research whether accessing dirent past its formal size is undefined behavior (possibly by reading https://rust-lang.github.io/unsafe-code-guidelines ). If yes, then we should fix or remove readdir function from libc, because it is impossible to use it correctly

devnexen commented 2 years ago

or you can copy the result into a sized buffer. I disagree removing readdir honestly.

safinaskar commented 2 years ago

or you can copy the result into a sized buffer

You mean read result from dirent to sized buffer? Okey, but is this reading undefined behavior?

devnexen commented 2 years ago

Note that unsafe really means what it means, inside this block, rust safety and guarantees do not apply, it is the responsibility of code user to handle carefully possible side effects, use after free, stack overflow ... just like with language like C.

If anyone wants to chime in the discussion feel free :-)

digama0 commented 1 year ago

@devnexen consider this code:

// This code for x86_64-unknown-linux-gnu
use std::ffi::CStr;
fn main() {
    unsafe {
        let dir = libc::opendir(b"/mnt/3\x00" as *const u8 as *const i8);
        assert_ne!(dir, std::ptr::null_mut());
        loop {
            let ent: *mut libc::dirent = libc::readdir(dir);
            if ent == std::ptr::null_mut() {
                break;
            }
            let name = CStr::from_ptr((*ent).d_name.as_ptr());
            println!("{}", name.to_bytes().len());
            println!("{:?}", name);
        }
    }
}

This code correctly handles file names bigger than 256 bytes. Does it invoke undefined behavior? I. e. is reading past formal struct size undefined behavior?

I don't have all the context to answer this question definitively, but assuming that readdir actually returned a contiguous allocation with a null-terminated d_name field of longer than 256 bytes, your code as written is suspicious and possibly UB (whether it is actually UB is rust-lang/unsafe-code-guidelines#134) due to creating a reference at (*ent).d_name.as_ptr(), but you can rewrite it to be definitely not UB by using addr_of!((*ent).d_name) instead.

safinaskar commented 1 year ago

@digama0 , thanks a lot for answer. This is very big gotcha. This means lib::dirent should be fixed or at very least properly documented

safinaskar commented 1 year ago

I just found that actual std implementation of ReadDir is even more careful. The authors reject addr_of!((*ent).d_name) as wrong and instead perform offset computations manually. See code starting from this line: https://github.com/rust-lang/rust/blob/42f28f9eb41adb7a197697e5e2d6535d00fd0f4a/library/std/src/sys/unix/fs.rs#L676

digama0 commented 1 year ago

@RalfJung Could you comment on the correctness of the comment there? My take is that the reason addr_of!((*ptr).d_name) is said to be invalid is because the allocation may not actually be 256 bytes (plus the rest of the struct), and (at least for now) addr_of!((*ptr).d_name) asserts that the entire d_name: [c_char; 256] field is within an allocation. So an alternative way to fix the code would be to have a version of dirent with a d_name: [c_char; 0] VLA and using that to do the offset computation.

RalfJung commented 1 year ago

Yeah that comment sounds accurate for the current very conservative semantics we have for *ptr (where the pointer has to be fully aligned and dereferenceable). That comment is also a good argument for why we'd probably want a more relaxed semantics...

tgross35 commented 2 weeks ago

It sounds like we can improve things with a breaking change. If anyone has any proposals, feel free to put up a PR for 1.0.

safinaskar commented 2 weeks ago

@tgross35 , obviously, perfect solution would be waiting for "thin cstr" and making thin cstr (not reference to thin cstr, but unsized thin cstr itself) last member of dirent

tgross35 commented 2 weeks ago

Yeah, I do wish we had that available. Since we don't though, changing to [c_char; 0] is probably our best bet.

nwalfield commented 1 week ago

This issue is about the name being longer than 256 bytes. I'd like to add that the name could even be shorter than 256 bytes! I have observed this in practice and this is corroborated by this comment in the std library:

                // The dirent64 struct is a weird imaginary thing that isn't ever supposed
                // to be worked with by value. Its trailing d_name field is declared
                // variously as [c_char; 256] or [c_char; 1] on different systems but
                // either way that size is meaningless; only the offset of d_name is
                // meaningful. The dirent64 pointers that libc returns from readdir64 are
                // allowed to point to allocations smaller _or_ LARGER than implied by the
                // definition of the struct.

I noticed this, because the following code was crashing:

unsafe { *self.entry }.d_type

where self.entry is libc::dirent64 or libc::dirent. At least when compiling in debugging mode, the unsafe block copies the whole structure, and then accesses d_type. The memcpy can result in a seg fault if the direct data structure is smaller, at the end of a page, and the following page is not mapped:

Thread 7 "store::certd::t" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 10589]
memcpy () at src/string/x86_64/memcpy.s:18
warning: 18     src/string/x86_64/memcpy.s: No such file or directory
(gdb) up
#1  0x000055f054f18d34 in openpgp_cert_d::unixdir::DirEntry::file_type (self=0x7fa822fdb780)
    at src/unixdir.rs:144
144             FileType(unsafe { *self.entry }.d_type)

Making d_name a [c_char; 0] (or even a [c_char; 1] since the name is NUL terminated) would prevent this issue, I think.

RalfJung commented 1 week ago

Yeah by wrapping *self.entry in curly braces you are forcing that value to be loaded and moved. I would recommend you use unsafe { *self.entry.d_type } to avoid such an unnecessary load + move.

nwalfield commented 1 week ago

Yeah by wrapping *self.entry in curly braces you are forcing that value to be loaded and moved. I would recommend you use unsafe { *self.entry.d_type } to avoid such an unnecessary load + move.

Thanks a lot for the suggestion!

nwalfield commented 5 days ago

For the record, unsafe { *self.entry.d_type } doesn't work, but this does:

        unsafe { (&*self.entry).d_type }

(Here's my complete change.)

RalfJung commented 5 days ago

unsafe { (*self.entry).d_type } should also work.

nwalfield commented 5 days ago

Indeed it does. Thanks!