rust-lang / git2-rs

libgit2 bindings for Rust
https://docs.rs/git2
Apache License 2.0
1.68k stars 385 forks source link

Problematic raw pointer conversion in buf.rs #760

Open mazze1200 opened 2 years ago

mazze1200 commented 2 years ago

Hello all, I'm not sure if I made a conceptual mistake here and of not if this is actually a problem, but I would like to ask you to have a look at this. Here are the tests that I ran on buf.rs

#[cfg(test)]
mod buf_test {
    use crate::{util::Binding, Buf};
    use libgit2_sys::git_buf;

    #[test]
    #[allow(unused)]
    fn pointer_tests() {
        let buf: Buf = Buf::new();
        let buf_addr: u64 = (&buf) as *const Buf as u64;
        let buf_raw_addr: u64 = (&buf.raw) as *const git_buf as u64;
        /* Since Buf is just a wrapper around raw::git_buf which actually holds the raw::git_buf struct,
        they should refer to the same memory address */
        assert_eq!(buf_addr, buf_raw_addr);

        let git_buf: *mut git_buf = Binding::raw(&buf);
        let git_buf_addr: u64 = git_buf as u64;
        /* It should not make a difference if we retrieve the raw::git_buf from Binding::raw or from struct Buf.raw */
        assert_eq!(buf_raw_addr, git_buf_addr);

        let from_raw_buf: Buf = unsafe { Binding::from_raw(git_buf) };
        let from_raw_buf_addr: u64 = (&from_raw_buf) as *const Buf as u64;
        /* If we recreate Buf from raw pointer to git_buf and Buf actually holds the struct raw::git_buf we should end up with
        the same memory address as to begin with (which is not possible) or we have moved raw::git_buf which could cause trouble
        on the C side since C does not know about our move.   */
        assert_eq!(buf_addr, from_raw_buf_addr);

        let from_raw_buf_raw_addr: u64 = (&from_raw_buf.raw) as *const git_buf as u64;
        /* This actually shows that we moved raw::git_buf here
            impl Binding for Buf {
                type Raw = *mut raw::git_buf;
                unsafe fn from_raw(raw: *mut raw::git_buf) -> Buf {
                    Buf { raw: *raw } // move of raw pointer
                }

        This behavior may be problematic when Rust receives a buffer from C to write into e.g. here
            @param out git_buf to write data into
            @param repo Repository to read prepared message from
            @return 0, GIT_ENOTFOUND if no message exists or an error code
            GIT_EXTERN(int) git_repository_message(git_buf *out, git_repository *repo);
        */
        assert_eq!(buf_raw_addr, from_raw_buf_raw_addr);
    }
} 

So my assumption is, that the struct

pub struct Buf {
    raw: raw::git_buf,
}

should have been

pub struct Buf {
    raw: *mut raw::git_buf,
}

similar to repo.rs

pub struct Repository {
    raw: *mut raw::git_repository,
}
ehuss commented 2 years ago

I think the confusion may be that the contents of git_buf are what matters, not the actual outer struct. You can move and copy and duplicate a git_buf just fine, as long as the ptr field stays the same. So creating a new Buf using from_raw is fine, it creates a new copy, but the contents are the same.

The difference from something like git_repository is that libgit2 allocates the git_repository struct.

I'm not sure what your concern is around git_repository_message. It takes a pointer to a buffer which it will store the message. Can you say more about what the actual problem is?

mazze1200 commented 2 years ago

Hello @ehuss , thanks for your reply. Maybe some background; I'm trying to bring the libgit2 filter framework to rust to implement filters (like git-lfs) in rust. The filter framework does heavily rely on callbacks and, of course, I want to have the callbacks implemented as regular rust closures.

But the git_buf and it's current rust counterpart Buf.rs gives me headache. In the current implementation simple implementations like this

    #[test]
    #[allow(unused)]
    fn drop_test() {
        // this increases the size of the buffer by allocating a new array, storing it in git_buf.ptr and freeing the old array
        extern "C" fn grow(buf: *mut raw::git_buf, size: usize){
            unsafe {raw::git_buf_grow(buf, size);}
        }

        fn dropping_function(buf: *mut git_buf) {
            let _buf: Buf = unsafe { Buf::from_raw(buf) }; 
        } // here _buf will be dropped and git__free(git_buf) will be called

        let mut buf = Buf::new();
        // This would be done on the C-side to increase the size of the buffer
        grow(buf.raw(), 1);

        dropping_function(buf.raw());
    } // here buf will be droppped => double free

would produce a double free.

Furthermore if we have a situation where libgit2 wants to call a callback function like e.g.

extern "C" fn apply_cb(to: *mut raw::git_buf, from: *const raw::git_buf) -> c_int

and expects that the callback copies the content from "from" to "to" we would currently not be able to do so, since as soon as we call Buf::from_raw(to) we would move the content of "to" to a new position in memory and everything we do to that new git_buf cannot be seen by the caller of apply_cb after apply_cb finished (please see the git_buf fields asize and size).

impl Binding for Buf {
    type Raw = *mut raw::git_buf;
    unsafe fn from_raw(raw: *mut raw::git_buf) -> Buf {
        Buf { raw: *raw }
    }
} 

pub struct git_buf {
    pub ptr: *mut c_char,
    pub asize: size_t,
    pub size: size_t,
}

Please let me know if I missed something here.

ehuss commented 2 years ago

It's difficult to say without looking at the actual libgit2 functions you want to wrap. I'm not familiar with the "filter framework", but if you could point to specific libgit2 functions that you want to wrap, we can take a closer look.

If there are functions that consume a Buf, then likely it would need to call something like mem::forget to prevent the double free.

mazze1200 commented 2 years ago

Hello @ehuss so in particular what I want to do is write a rust function that can be used as the function pointer parameter *write_fn in the libgit2 function

extern int git_filter_buffered_stream_new(
    git_writestream **out,
    git_filter *filter,
    int (*write_fn)(git_filter *, void **, git_str *, const git_str *, const git_filter_source *),
    git_str *temp_buf,
    void **payload,
    const git_filter_source *source,
    git_writestream *target);

https://github.com/libgit2/libgit2/blob/9b04a3076d419279ac43f165be89a9893f9f0670/src/filter.h#L76

To do so I have a wrapper function write_fn

extern "C" fn write_fn(
    git_filter: *mut raw::git_filter,
    payload: *mut *mut c_void,
    to: *mut raw::git_buf,
    from: *const raw::git_buf,
    src: *const raw::git_filter_source,
) -> c_int

within that I'm going to convert the pointer types to proper rust types and call my rust closure with references to these types.

But here the trouble begins, "to" and "from" are buffers that are references, that git_filter_buffered_stream_new certainly wants to have back after write_fn returns. This could be handled by mem::forget but for the "to" parameter (the buffer where the clousure shall write the data into) I would also need to manually undo the move from Buf::from_raw so git_filter_buffered_stream_new will be able to see what has happened in the closure.

So the result would need to look something like this

fn closure(to: &mut Buf, from: &Buf){}

extern "C" fn write_fn(
    git_filter: *mut raw::git_filter,
    payload: *mut *mut c_void,
    to: *mut raw::git_buf,
    from: *const raw::git_buf,
    src: *const raw::git_filter_source,
) -> c_int
{
     // create proper rust types
    let to_buf = Buf::from_raw(to);
    let from_buf = Buf::from_raw(from);

    // call closure
    closure(&mut to_buf, &from_buf);

    // undo the move from Buf::from_raw
    *to = *to_buf.raw();

    // prevent double frees and dangling pointers from dropping of still in use buffers
    std::mem::forget(to_buf);
    std::mem::forget(from_buf);
}     

Is that the interface libgit2-rs shall present to the outside world?

ehuss commented 2 years ago

That looks like it would probably work to me.

It is kinda strange that git_filter_buffered_stream_new is undocumented, whereas the rest of the filter api is documented.