nix-rust / nix

Rust friendly bindings to *nix APIs
MIT License
2.62k stars 657 forks source link

API Request: expose getpwent and getgrent #1811

Open SteveLauC opened 2 years ago

SteveLauC commented 2 years ago

These two syscalls sequentially scan the /etc/passwd/ and/etc/group` databases and return all the entries in those files one by one.

Signature

pub fn getpwent() -> Result<Vec<User>, Errno>;
pub fn getgrent() -> Result<Vec<Group>, Errno>;

Implementation

On some OSes, we have non-reentrant functions getpwent_r() and getgrent_r(). If so, they will be used. Otherwise, will use getpwent() and getgrent().

Here is draft implementation of getgrent under Linux:

pub fn getgrent() -> Result<Vec<Group>, Errno> {
    let mut v: Vec<Group> = Vec::new();

    let mut gr_buf: group = unsafe { zeroed() };
    let mut gr_str_buf: [c_char; 4096] = [0; 4096];
    unsafe { setgrent() };

    loop {
        let mut result: *mut group = null_mut();
        let res: c_int = unsafe {
            getgrent_r(
                &mut gr_buf as *mut group,
                &mut gr_str_buf as *mut c_char,
                4096,
                &mut result as *mut *mut group,
            )
        };

        // no more entries
        if res == ENOENT {
            unsafe { endgrent() };
            break;
        }

        // error
        if res == ERANGE {
            return Err(Errno::from_i32(ERANGE));
        }

        // man page for `getgrent_r` only lists the two above erroneous cases.
        // In other cases, it should be successful.
        assert_eq!(res, 0);
        v.push(Group::from(&gr_buf));
    }

    Ok(v)
}

Note

  1. The above implementation calls setgrent and endgrent inside, which actually changes the semantics of this syscall.
  2. Due to Note 1, perhaps we can give a better name to those syscalls, like:

    impl User {
        pub fn all_users() -> Result<Vec<User>, Errno>;
    }
    impl Group {
        pub fn all_groups() -> Result<Vec<Group>, Errno>;
    }
fogti commented 2 years ago

I think it would be clearer if unsafe { endgrent() }; would be moved below the loop.

SteveLauC commented 2 years ago

I think it would be clearer if unsafe { endgrent() }; would be moved below the loop.

Good catch! It should be.

asomers commented 2 years ago

Hm. This would work. But I suggest changing the interface to avoid collecting into a Vec. More like this:

struct GroupIter {...}
impl Iterator for GroupIter {
    type Item = Group;
    fn iter(&mut self) -> Self::Item {..}
}
impl Group {
    fn iter() -> GroupIter {...}
}
fogti commented 2 years ago

@asomers this could easily accidentially run into problems when multiple parts of the program access the thread-local iterator using multiple iterator instances. Why not use FromIterator instead?

asomers commented 2 years ago

You could mark the GroupIter as !Send and !Sync to prevent multiple threads from accessing it. How do you suggest using FromIterator?

fogti commented 2 years ago

Oh, not only that, I was talking about creating multiple instances of GroupIter on the same thread, then accessing them in such a way that they all get consumed in a mixed fashion, so at least some of them get not iterated from front-to-back without interleaving with another one. This would afaik result in each of them "receiving" a partition of the entries, which imo is bad.

asomers commented 2 years ago

Well, that wouldn't happen if you use getgrent_r, right?

fogti commented 2 years ago

We still use setgrent and endgrent, they don't take a buffer/context handle, so I assume they wouldn't be safe to use in that case, right?

SteveLauC commented 2 years ago

Oh, not only that, I was talking about creating multiple instances of GroupIter on the same thread, then accessing them in such a way that they all get consumed in a mixed fashion, so at least some of them get not iterated from front-to-back without interleaving with another one. This would afaik result in each of them "receiving" a partition of the entries, which imo is bad.

This is correct, it is safe only if we can guarantee that there is only one instance running cause `setpwent/endpwment will modify a global state which makes instances collide with each other.

rust-users is a crate that does a similar job. They expose these syscalls in an interface of an iterator, but they mark this as unsafe, here is the document.

pub unsafe fn all_users() -> impl Iterator<Item = User>

And here is a related issue.

BTW, in this perspective, my draft impl is also unsafe if multiple functions are run in parallel, right?

fogti commented 2 years ago

BTW, in this perspective, my draft impl is also unsafe if multiple functions are run in parallel, right?

yes. Unless you protect the code section between setgrent and endgrent (inclusive these calls) and such with a Mutex.