rust-lang / wg-allocators

Home of the Allocators working group: Paving a path for a standard set of allocator traits to be used in collections!
http://bit.ly/hello-wg-allocators
209 stars 9 forks source link

Interop with C code that uses `malloc`/`free` #125

Open umanwizard opened 5 months ago

umanwizard commented 5 months ago

It would be nice to provide safe Rust APIs that use normal Rust types like Vec<_, std::alloc::System>, Box<_, std::alloc::System> with C libraries that expect to free pointers allocated by their clients, or inversely, to malloc pointers that their clients later free.

Currently this can't be done, and the only safe way to interact with the pointers is to use libc::malloc and libc::free on the Rust side. This is for at least a few reasons that I can identify:

First, it is explicitly not allowed according to the documentation of std::alloc::System, which says: "it is not valid to mix use of the backing system allocator with System, as this implementation may include extra work, such as to serve alignment requests greater than the alignment provided directly by the backing system allocator."

Second, even if it were allowed, it would require the C API to inform its clients of the capacity and length of allocated pointers. For example, consider these code snippets:

// C code
typedef struct {
    size_t len;
    unsigned char *buf;
} sized_buf_t;

// clone a zero-terminated string into a length+buffer string
sized_buf_t copy_cstr(const char *cstr) {
    size_t cap = 0;
    size_t len = 0;
    unsigned char *buf = NULL;
    for (const char *p = cstr; *p; ++p) {
        if (len == cap) {
            cap = MAX(1, 2 * cap);
            buf = realloc(out, cap);
        }
        buf[len++] = *(unsigned char*)p;
    }
    return {len, buf};
}
// Rust code

use std::alloc::System;

fn copy_cstr(cstr: &CStr) -> Vec<u8, System> {
    unsafe {
        let sb = c::copy_cstr(cstr.as_ptr());
        Vec::from_raw_parts_in(sb.buf, sb.len, todo!("we don't know the capacity..."), System)
    }
}

This code can't be made to work, because we are not allowed to construct a Vec without knowing the capacity of the underlying allocation. Even though the underlying free doesn't care, the documentation of Vec::from_raw_parts_in prohibits this. See e.g. https://github.com/rust-lang/wg-allocators/issues/99 for further discussion.

Third, even if the above two issues were solved, the ergonomics are still bad because (AFAIK) there's no automatic way to convert from Vec<_, System> to Vec<_>, even if the global allocator is indeed System. It would be nice if we could write code like this:

let v: Vec<_> = copy_cstr(c"foo");

and have it compile in programs where the global allocator is System, and fail with a type error otherwise. However, even in such programs, Vec<T, System> and Vec<T, Global> are still different types.