yallop / ocaml-ctypes

Library for binding to C libraries using pure OCaml
MIT License
368 stars 95 forks source link

Make management of C-memory allocated from OCaml more explicit #571

Open robhoes opened 6 years ago

robhoes commented 6 years ago

Ctypes has a number of functions that allocate C memory and return an OCaml value containing a pointer to it. Examples are make and CArray.make. This C memory is automatically freed when the associated OCaml value is collected by the GC due to the finaliser that is attached to it.

This is convenient, since it makes it harder for memory leaks to occur due to forgetting to free the memory. On the other hand, to avoid memory corruption, you need to be very careful to ensure that the OCaml value is kept alive (not GC-able) for as long as there is (C) code using the memory. It turns out that it is not always straightforward to do this, and it is easy to make mistakes.

An example is the use of a C function that takes a pointer to memory allocated from OCaml (as above), and returns a pointer to (somewhere in) the memory, perhaps after writing some data to it. The returned pointer will be wrapped in a new OCaml value, distinct from the original one, because ctypes can't tell that they are related. We must now keep the original value alive, along with the returned one.

A real example (thanks @yallop):

let strchr = Foreign.foreign "strchr" (ptr char @-> char @-> returning (ptr char)) in
let p = CArray.of_string "abc" in
let q =  strchr (CArray.start p) 'a' in
let () = Gc.compact () in
Printf.printf "%c\n" !@q

Immediately after strchr returns, value p is at risk of being GC'ed, which would free the array. From then on, q is pointing to invalid memory, and an arbitrary character may be printed in the last line. To avoid this, we would have to actually "use" p somehow after the last use of q, which is awkward and unintuitive, and therefore error prone.

See https://github.com/xapi-project/xen-api/pull/3669 for a real-life example of a fix that was made for such a problem.

Something similar occurs when allocating memory from OCaml, and storing the pointer in a struct, even if the struct is also allocated from OCaml. It is mandatory to keep all OCaml values alive. See https://github.com/xapi-project/ocaml-opasswd/pull/14 for an example of this, which is a fix that still turned out to be incomplete!

There are several other subtleties in the area of memory management, which I have not mentioned above, but which users of ctypes should be aware to avoid corruption. Such mistakes could be avoided by adopting a more explicit memory model, which does not use finalisers, and where it is up to the OCaml program to free the (C) memory using a call to a deallocation function (e.g. Ctypes.deallocate). The trade-off is indeed that doing so introduces a risk of memory leaks, but it may be clearer to see what is going on.

A possible way to do this, while maintaining backwards compatibility, is to add an optional, Boolean argument to all "make" functions, which has the effect that no finaliser is added to the new value.

Fizzixnerd commented 2 months ago

Following the discussion on: https://discuss.ocaml.org/t/blog-ocaml-ffi-sharp-edges-and-how-to-avoid-them/14935 :

@edwintorok made the suggestion of using a construct like ignore (Sys.opaque_identity x) as a method of keeping x alive at least until the execution of that statement with little-to-no runtime overhead.

I would like to open a small PR to add let keep_alive x = ignore (Sys.opaque_identity x), as in my testing it has been extremely effective at solving this problem.

The code above would transform to:

let strchr = Foreign.foreign "strchr" (ptr char @-> char @-> returning (ptr char)) in
let p = CArray.of_string "abc" in
let q =  strchr (CArray.start p) 'a' in
let () = Gc.compact () in
let () = keep_alive (p, q)
Printf.printf "%c\n" !@q

Where I have kept q alive just for simplicity's sake as well, even though it's technically not required. This also demonstrates that you can happily just stick all the variables you need alive in a tuple.

I would suggest the following changes (and am willing to complete the work required for them):

  1. Add keep_alive as defined above as a helper for users of Ctypes.
  2. Add documentation for keep_alive where appropriate, likely somewhere near the top of the .mli or near allocate_n/allocate, as this is a pretty endemic issue it seems.
  3. (Optional) Add an PPX that inserts keep_alive for you where appropriate. That is, the above code would be transformed to something that looks like:
    let strchr = Foreign.foreign "strchr" (ptr char @-> char @-> returning (ptr char)) in
    let%keep_alive p = CArray.of_string "abc" in
    let%keep_alive q =  strchr (CArray.start p) 'a' in
    let () = Gc.compact () in
    Printf.printf "%c\n" !@q

    which would transform roughly to:

    let strchr = Foreign.foreign "strchr" (ptr char @-> char @-> returning (ptr char)) in
    let p = CArray.of_string "abc" in
    let __gensym0 = 
        let q =  strchr (CArray.start p) 'a' in
            let __gensym1 = 
                let () = Gc.compact () in
                Printf.printf "%c\n" !@q
           in Ctypes.keep_alive q; __gensym1
    in Ctypes.keep_alive p; __gensym0

    or similar.

If you don't want to muddy the waters with a PPX, I totally get it, and in that case would probably just create a separate opam package containing just that PPX. I do strongly think that the keep_alive function belongs in Ctypes, however.

yallop commented 2 months ago

Exposing a keep_alive function is a good idea, but I'm not sure that Sys.opaque_identity actually provides the necessary guarantees. So I recommend either

Fizzixnerd commented 2 months ago

See https://github.com/ocaml/ocaml/pull/13288 for updates, but it should work according to the above linked information and ocaml/ocaml#9412 for when the change occured.

yallop commented 2 months ago

Ah, indeed: after https://github.com/ocaml/ocaml/pull/9412 Sys.opaque_identity can be used to guarantee liveness, so a patch implementing keep_alive that way would be welcome. (And if you're so inclined, replacing Ctypes_memory_stubs.use_value with your new function in the ctypes implementation would be welcome, too.)

gasche commented 2 months ago

I don't think you need a ppx for this:

(* library code *)
let alive x f =
  let result = f x in
  Sys.opaque_identity x;
  result

(* generic binding operator *)
let (let@) x f = f x

(* your example *)
let@ p = CArray.of_string "abc" |> alive in
let@ q = strchr (CArray.start p) 'a' |> alive in
let () = Gc.compact () in
Printf.printf "%c\n" !@q

(But: I agree that this programming model looks fairly fragile overall, almost worse than writing the C code directly. I wonder what's the desirable big-picture for Ctypes.)

edwintorok commented 2 months ago

Thanks for the example, indeed OCaml has evolved a lot since this issue was originally opened, and the let syntax is a good idea for this.

Perhaps someone could invent a fully type safe way that requires you to use a function equivalent to alive to call foreign functions (e.g. some kind of phantom parameter that when you call Carray.start it wouldn't be valid to pass it directly to an FFI but you'd have to go through alive which modifies the phantom parameter to be acceptable by the FFI). Although obviously that may introduce some additional overhead due to closure allocations, so perhaps such an interface should be opt-in, where type and memory safety is more important than performance.

(But: I agree that this programming model looks fairly fragile overall, almost worse than writing the C code directly. I wonder what's the desirable big-picture for Ctypes.)

(There are pros and cons of both approaches, Ctypes helps prevent certain kinds of errors, while introducing others. We eventually had to rewrite one Ctype binding in old fashioned C after not being able to figure out a way to make it stop segfaulting, but that was back in 2018, and Sys.opaque_identity didn't have the semantics it has now. We have a lot of other Ctypes bindings and haven't had a problem with them since, although to be fair they rarely use strings, or arrays.) ) )

Fizzixnerd commented 2 months ago

I don't think you need a ppx for this:

(* library code *)
let alive x f =
  let result = f x in
  Sys.opaque_identity x;
  result

(* generic binding operator *)
let (let@) x f = f x

(* your example *)
let@ p = CArray.of_string "abc" |> alive in
let@ q = strchr (CArray.start p) 'a' |> alive in
let () = Gc.compact () in
Printf.printf "%c\n" !@q

(But: I agree that this programming model looks fairly fragile overall, almost worse than writing the C code directly. I wonder what's the desirable big-picture for Ctypes.)

I think this would work as a better binding operator:

let (let@) x f =
  let ret = f x in
  keep_alive x;
  ret

which would transform the example:

let@ p = CArray.of_string "abc" in
let@ q = strchr (CArray.start p) 'a' in
let () = Gc.compact () in
Printf.printf "%c\n" !@q

Is this not better?

edwintorok commented 2 months ago

I'd suggest using an operator other than let@ (or let*, or let+), these usually have specific meanings that don't involve keep alive, but yes other than that I like the idea of putting keepalive into the operator itself.

gasche commented 2 months ago

Personally I prefer to have only a few binding operators that have a well-understood meaning (sometimes, like let@, they have a single definition; sometimes it is ad-hoc polymorphism because they mean "the {map,bind,pair} function currently in scope"), instead of a multiplication of domain-specific operators whose meaning is not at all apparent from the name.

See the discussion in https://github.com/ocaml/ocaml/pull/9887 (inconclusive: my opinion does not appear to be consensual), and the proposals for a better syntax for binding operators in https://github.com/ocaml/ocaml/discussions/12015 (I would like to make (let/<ident>) a reality, but I haven't had time to work on it and no one else has volunteered to do it.)

Fizzixnerd commented 2 months ago

I see, I was aware of let+ and let*, and chose let@ only because @ suggests "alive". I would note, that this is just a slightly modified identity monad from the value level perspective, so let* is perhaps not as inappropriate as it first appears, but I understand that without a bind operator and specific type in scope it would be hard to justify.

That being said, I've just had a sort of random idea. Not sure if it's appropriate for Ctypes necessarily, but imagine something like the following:

module Living = struct
    type _ dep = Dep : 'a -> unit dep

    type 'a t = { value : 'a; dependencies : unit dep list }

    let bind (f : 'a -> 'b t) (x : 'a t) =
        let deps = Dep x :: x.dependencies in
        let y = f x.value in
        {y with dependencies = y.dependencies @ deps}

    let return x = {value = x; dependencies = []}

    let value x = x.value

    module Let_syntax = struct (* standard bind and map operators here *) end
end

The idea here is that a value carries it's dependencies with it as existential items in a heterogeneous list. This way, when you want to access the value of x, you'd access value x, but you'd be asked to never bind it to a variable. Linearity would be nice here, as it would ensure safety, but I still think it's possibly a better situation to what we have now.

This is not quite a monad, since x >>= return is not exactly equal to x. Changing the list to a Set would work, but then it would fail to be a "plain" monad because of the hashing/comparison constraint on x. Anyways, I might take a look at this later as a solution, but for now I'm happy enough in my own library to define let$ as the above keep_aliver and go from there.

Edit: changing the list to a Set, and also putting x in it in return, I should say.