yallop / ocaml-ctypes

Library for binding to C libraries using pure OCaml
MIT License
363 stars 95 forks source link

Support for gmp? #643

Closed disteph closed 3 years ago

disteph commented 4 years ago

Feature request: Would it be possible to have ctypes bridge gmp types with zarith types or mlgmpidl types? I.e., in the same way that let ml_function = Foreign.foreign "c_function" Ctypes.(long @-> ...) wraps a C function taking an argument of (C) type long as an ML function of type Signed.long ->..., let ml_function = Foreign.foreign "c_function" Ctypes.(mpz @-> ...) would wrap a C function taking an argument of (C) type mpz_t as an ML function of type Z.t -> ... for a type handler val mpz : Z.t Ctypes.typ ? Is this something I can do myself as syntactic sugar on top of what exists? My trouble was that the types Z.t/Q.t in Zarith or their counter-part in mlgmpidl are abstract and I don't see how I can represent them in the Ctypes world...

yallop commented 4 years ago

I expect this can be built on top of Ctypes. Since Z.t values can be are either OCaml integers or custom blocks corresponding to gmp values it'd be necessary to handle both cases.

gasche commented 4 years ago

I tried to provide a bit more information to @disteph about how to do this, but I realized that this is rather difficult -- I don't know how to do it. (I'm unfamiliar with Ctypes.)

Context:

A first idea (which I believe does not work) is to first bind the mpz/Zarith.t conversion functions into OCaml as externals, and then use them in the view function. Sketch:

(* this type represents mpz_t values from the OCaml side *)
type mpz_t = unit ptr (* inspired from examples/ncurses *)

(* C-type for mpz_t  *)
let mpz_t : mpz_t typ = ptr void (* inspired from examples/ncurses *)

(* bindings to conversion functions implemented in C (zarith.h or other glue) *)
external zarith_of_mpz : mpz_t -> Zarith.t = "ml_z_from_mpz"
external mpz_of_zarith : Zarith.t -> mpz_t = "?"

(* C-type for Zarith.t *)
let zarith = view ~read:zarith_of_mpz ~write:mpz_of_zarith

However, I believe that this approach is incorrect: unit ptr is the type of Ctypes representations of C values, not of naked C values (which are strongly discouraged in the OCaml runtime nowadays anyway).

I think that this is a fundamental issue with Ctypes+Zarith: the conversion functions between mpz_t and Zarith.t cannot be used as OCaml externals, as the mpz_t type cannot be safely represented in OCaml. I do not know how to use Ctypes in this situation. Ideally I would want to have a sort of view where the conversion functions are not used on the OCaml side, but on the C side.

disteph commented 4 years ago

I was trying the first way, hoping that the conversion from Ctypes representations of C values to naked C values could be simply done in C with Data_custom_val, yielding the following wrapper around zarith.h, to be used when you say "zarith.h or other glue", namely:

#include <stdlib.h>
#include <stdint.h>
#include <gmp.h>
#include <zarith.h>
#include <caml/mlvalues.h>
#include <caml/memory.h>
#include <caml/alloc.h>

/* sets rop to the value in op (limbs are copied) */
CAMLprim value ml_z_mpz_set_z_ml(value rop, value op) {
  CAMLparam2(rop, op);
  mpz_ptr z = (mpz_ptr) (Data_custom_val(rop));
  ml_z_mpz_set_z(z, op);
  CAMLreturn(Val_unit);
}

/* inits and sets rop to the value in op (limbs are copied) */
CAMLprim value ml_z_mpz_init_set_z_ml(value rop, value op) {
  CAMLparam2(rop, op);
  mpz_ptr z = (mpz_ptr) (Data_custom_val(rop));
  ml_z_mpz_init_set_z(z, op);
  CAMLreturn(Val_unit);
}

/* returns a new z objects equal to op (limbs are copied) */
CAMLprim value ml_z_from_mpz_ml(value rop) {
  CAMLparam1(rop);
  mpz_ptr z = (mpz_ptr) (Data_custom_val(rop));
  CAMLreturn(ml_z_from_mpz(z));
}

(using mpz_ptr --a pointer to the mpz struct-- rather than mpz_t --a size 1 array of the mpz struct-- as I can easily convert between the two in C or in Ctypes).

But I feel I'm completely out of my depth here.

disteph commented 4 years ago

The wrapping of zarith.c attempted above was inspired by Zarith's implementation of conversion functions to/from MLGMPIDL, an Ocaml wrapper of GMP, and I was hoping that Ctypes's representation of an mpz_ptr value is the same as MLGMPIDL's, i.e. Data_custom_val really gives the expected C mpz_ptr value.

But then I'm still stuck with mundane build/linking problems. And perhaps my assumption above is incorrect, or perhaps Ctypes people would approach the problem in a different way :-) I'd be happy to hear.

gasche commented 4 years ago

@disteph one problem I see with writing your own conversion functions on the C side is that it is not clear, to me, how to get Ctypes to automatically use them when binding third-party C functions to the OCaml world. If you have to write your own C-side wrapper of each third-party function, and then connect this to the OCaml world to ctypes, you lose a part of the value of the tool. (But maybe I misunderstood your approach.)

Another approach I considered recently is: instead of trying to use external to get the conversion functions into the OCaml world, we can use Ctypes's own support for external functions:

type mpz_t = unit ptr
let mpz_t : mpz_t typ = ptr void

let zarith_of_mpz : mpz_t -> Zarith.t = Cstubs.foreign "ml_z_from_mpz" (mpz_t @-> returning ??)
let mpz_of_zarith : Zarith.t -> mpz_t = Cstubes.foreign "??" (?? @-> returning mpz_t)

let zarith : Zarith.t typ = view mpz_t ~read:zarith_of_mpz ~write:mpz_of_zarith

This solves the problem on the mpz_t (or mpz_ptr indeed) side, as now we properly manipulate them, but it creates a problem on the Zarith side: what is the ctype we should use to represent the result of ml_z_from_mpz? It cannot be val zarith : Zarith.t typ which is not defined yet -- and performs a conversion implicitly. It should be a type of raw OCaml values, as seen from the C side (so some form of primitive val value : Obj.t typ. But I could not find this in the Ctypes interface -- although something very similar seems used internally to define primitive types.

disteph commented 4 years ago

Well indeed, using Ctypes to bind third-party functions manipulating mpz_t / mpz_ptr would not make ocaml functions using Zarith's Z.t but rather functions manipulating the Ctypes representation of mpz_t / mpz_ptr. But if I have Ocaml functions converting Z.t to/from the Ctypes representation of mpz_t / mpz_ptr (binding my 3 C functions above via external), I can compose them in the Ocaml world, so that I don't have to write my own C-side wrapper of each third-party function (I would wrap them in Ocaml). In this approach Ctypes never has to know about Zarith. This was what I was trying to do.

But I agree it'd be much better to have Ctypes representations of Zarith types, with a handler val zarith_z : Z.t typ and have Ctypes automatically lift the third-party functions manipulating C mpz_t / mpz_ptr into OCaml functions manipulating Z.t, the same way it lifts third-party functions manipulating C long into OCaml functions manipulating Signed.Long.t using handler long : Signed.Long.t typ. But in order to do that I assume you have to deal with the internals of Ctypes and release a new version with a zarith dependency, rather than build it on top of Ctypes with its current API. Unless I misunderstand something.

fdopen commented 4 years ago

Ctypes can't deal with custom blocks created by hand-written third party libraries. But you can manually create two stub functions that convert such blocks to something that is understood by Ctypes - for example "raw pointers".

/* from_zarith writes a Z.t value (first parameter) to the location
   pointed to by the second parameter. nativeints are used to
   store pointers. */
CAMLprim value from_zarith(value zt, value np)
{
  CAMLparam2(zt,np);
  /* study zariths source code for details */
  mpz_t mp = zarith_representation_to_plain_c(zt); 
  *((mpz_t *)Nativeint_val(np)) = mp;
  CAMLreturn(Val_unit);
}

/* reverse: *mpz_t to Z.t */
CAMLprim value to_zarith(value v)
{
  CAMLparam1(v);
  CAMLlocal1(res);
  mpz_t m = *((mpz_t *) Nativeint_val(v));
  /* study zariths source code for details again */
  res = convert_to_zarith_representation(m); 
  CAMLreturn(res);
}

Now you can define the type with Ctypes and connect both (memory management and other details ignored)

type mpz_struct
let mpz_struct : mpz_struct Ctypes.structure Ctypes.typ =
  Ctypes.typedef (Ctypes.structure "") "__mpz_struct"
let _x = Ctypes.field mpz_struct "_mp_foo" Ctypes.int
(*  more fields *)
let () = Ctypes.seal mpz_struct

external from_zarith : Z.t -> nativeint -> unit = "from_zarith"
let from_zarith x =
  let res = Ctypes.allocate_n mpz_struct ~count:1 in
  Ctypes.to_voidp res |> Ctypes.raw_address_of_ptr |> from_zarith x;
  res

external to_zarith : nativeint -> Z.t = "to_zarith"
let to_zarith (x:mpz_struct Ctypes.structure Ctypes.ptr) =
  Ctypes.to_voidp x |> Ctypes.raw_address_of_ptr |> to_zarith

(* simplification. mpz_t only decays to a pointer, when used
   as function parameter ... *)
let mpz_t_fparam : Z.t Ctypes.typ =
  Ctypes.view
    ~format_typ:(fun k fmt -> Format.fprintf fmt "mpz_ptr%t" k)
    ~read:to_zarith
    ~write:from_zarith
    (Ctypes.ptr mpz_struct)

let mpz_t_frparam = Ctypes.typedef (Ctypes.ptr mpz_struct) "mpz_ptr"

let add =
  Foreign.foreign "mpz_add"
    (mpz_t_frparam @-> mpz_t_fparam @-> mpz_t_fparam @-> returning void)

let add : Z.t -> Z.t -> Z.t =
  fun a b ->
  let res = Ctypes.allocate_n mpz_struct ~count:1 in
  add res a b;
  to_zarith res

It's of course rather inefficient because of all the allocations and switches between C and the OCaml runtime that would not be necessary, if you write your stubs manually....

gasche commented 4 years ago

@fdopen Thanks for the suggestion; I had thought about something along those lines but I discarded it as "not the right way to do it", but thinking about it more it may at least provide a working (if unpleasant) solution to what I was trying to do.

I don't think that we need extra allocations to use the idea of wrapping pointers as native integers. We should be able to use Ctypes' own support for this: we can move from a mpz_ptr typ to a nativeint typ by using view ~read:raw_address_of_ptr ~write:ptr_of_raw_address, and then we can use view again to go from a nativeint typ to a Zarith.t typ, by using external functions that unwrap the nativeint before calling the conversion functions from zarith.h.

This gives the following sketch:

(* C-type for mpz_ptr  *)
type mpz_ptr = unit ptr
let mpz_ptr : mpz_ptr typ = ptr void (* inspired from examples/ncurses *)

(* mpz_ptr wrapped as a nativeint *)
let mpz_nativeint = view mpz_ptr ~read:raw_address_of_ptr ~write:ptr_of_raw_address

(* bindings to conversion functions implemented in C,
   unwrapping the nativeint before calling zarith.h conversions *)
external zarith_of_mpz_nativeint : nativeint -> Zarith.t = "ml_z_from_mpz_nativeint"
external mpz_nativeint_of_zarith : Zarith.t -> nativeint = "?"

(* C-type for Zarith.t *)
let zarith = view mpz_nativeint ~read:zarith_of_mpz_nativeint ~write:mpz_nativeint_of_zarith

I am not completely sure that this use of nativeint_of_raw_ptr is valid (with this unsafe function there is a risk of losing the last reference to the input and then having it garbage-collected). The raw-pointer is pointing to a C value that is not managed by the OCaml GC, so this is good, but I have the impression that unit ptr values contain a wrapper on the OCaml side, and I don't know how to reason about liveness of this value during view conversions.

gasche commented 4 years ago

(Note: in practice the API of zarith.h suggests that you don't want to create mpz_ptr values from a Zarith.t value (there is no such function provided, if I understand correctly), but rather write a Zarith.t value to an out-variable pointer passed in argument to the C function. I am glossing over this difference here, I hope this can be solved with the Ctypes machinery after the fundamental how-to-convert question is resolved.)

fdopen commented 4 years ago

I don't think that we need extra allocations to use the idea of wrapping pointers as native integers.

It can't be avoided for the reason you've already mentioned: the garbage collector. Just look at the mentioned mpz_add function whose prototype boils down to something like this (with all constness and obfuscation through typedefs removed):

void mpz_add (*__mpz_struct,*__mpz_struct,*__mpz_struct);

The function takes pointers to structures. And it's just not possible to convert Zarith.t values to a usable pointer on the fly. A Z.t value might be a plain integer, passed by value, without address. And if it is a custom block, you can't point to anything inside it, because it could be shuffled around by the GC. Another parameter of the function could be created through a Ctypes.view whose functions trigger the GC. So the pointer you’ve created with your view could already be invalid once it reaches the c function.

Therefore the view must create deep copies. Z.t values are completely stored inside the OCaml heap, whereas mpz_struct Ctypes.structure Ctypes.ptr points to memory inside the C heap.

(Note: in practice the API of zarith.h suggests that you don't want to create mpz_ptr values from a Zarith.t value (there is no such function provided, if I understand correctly)

I didn't know this header. But now I've read it and seems to provide anything necessary to implement the functions suggested by me:

CAMLprim value from_zarith(value zt, value np)
{
  CAMLparam2(zt,np);
  mpz_ptr ptr = (mpz_ptr)Nativeint_val(np);
  ml_z_mpz_init_set_z(ptr, zt);
  CAMLreturn(Val_unit);
}

CAMLprim value to_zarith(value v)
{
  CAMLparam1(v);
  CAMLlocal1(res);
  mpz_ptr ptr = (mpz_ptr)Nativeint_val(v);
  res = ml_z_from_mpz(ptr);
  CAMLreturn(res);
}
gasche commented 4 years ago

@fdopen I don't disagree but I think we are talking past each other. What I tried to do above is in fact to find a generic solution for the following problem: binding C functions that use a C type foo, but exposing them to OCaml with an OCaml type bar, provided that I have been given C implementations of conversions between foo and bar.

This sounds like a fairly common pattern to me, and I would have expected Ctypes to support it easily. As my hesitations show, this is not the case, at least for someone unexperienced with Ctypes. I currently think that the best way to do it, at least when foo is a pointer type, is to:

  1. use a Ctype.view to convert foo into nativeint (note: this part does not need any extra allocation/boxing)
  2. implement my own C-side "wrapped" conversions between nativeint-boxed foo pointers and bar, using the C library functions between foo and bar
  3. then use externals to export these wrapped-conversions to OCaml, and define a Ctype.view to convert from nativeint to bar

I am not sure if there would be a better way to do this today with Ctypes (an example of this in examples/ would already help), or whether the API could be extended to support this. (For example: I wondered whether it could be done more easily with a view_foreign function that does not expect ('a -> 'b) conversions in OCaml-land, but ('a -> 'b) fn conversions bound from C-land).

gasche commented 4 years ago

Re. deep vs. shallow copies: thanks for insisting on the fact that the mpz payload in Zarith custom blocks is inline (not a pointer outside the heap, which would have been my initial expectation). I think that it may still be possible to imagine a zero-copy binding done through Ctypes, if there is support for declaring that certain parameters of a bound function should be registered as roots for duration of the call. (I saw some root-handling logic when looking at the Ctype internals, but didn't look in depth.)

However, this is not necessary for a first approach to mpz/Zarith binding, the zarith.c functions (which do perform a copy) are the easiest way to proceed. In any case I don't think that copying the bigints is a performance bottleneck for @disteph's applications -- hopefully the bigint-taking functions are long enough to run that this is amortized.

fdopen commented 4 years ago

binding C functions that use a C type foo, but exposing them to OCaml with an OCaml type bar, provided that I have been given C implementations of conversions between foo and bar.

Ctypes needs to know the internals on any C type: size, etc. These information are necessary for memory management and for code generation (either source code or dynamic through libffi). Therefore the definiton of the C type must be repeated through Ctypes primitives. The connection to your own type is then covered by Ctypes.view.

If one also has to write C code, my syntax extension can be helpful. It allows to switch easily between C and OCaml.

I've uploaded example code: Line 6 defines the primary Ctypes.typ and line Line 15 and Line 28 demonstrate how to convert an OCaml type at the C level to something that was described earlier as a Ctypes.typ.

use a Ctype.view to convert foo into nativeint (note: this part does not need any extra allocation/boxing)

Two allocations in C in this case. The first for the structure that is passed to ml_z_mpz_set_z that then triggers a second allocation (for the "limb").

For example: I wondered whether it could be done more easily with a view_foreign function that does not expect ('a -> 'b) conversions in OCaml-land, but ('a -> 'b) fn conversions bound from C-land

What about error handling? You've transformed the first two parameters, but it doesn't work for the third one: Out of memory. The useless allocated ressources for the previous params must be cleaned up, before you can return an error value. How to encode the necessary logic from OCaml if transformations are only and somehow implicitly done in C-land?

disteph commented 4 years ago

Thanks for the in-depth discussion and the proposed code at https://github.com/fdopen/ctypes-zarith. Yes, the ppx_cstubs is neat! Indeed, I realise that there is a copy within ml_z_mpz_set_z and I'm not worried about that being a bottleneck in my application: the third-party functions manipulating mpz that I want to bind are not functions called over and over; I'm not trying to bind .

void mpz_add (*__mpz_struct,*__mpz_struct,*__mpz_struct);

but some API functions of our SMT-solver Yices; the heavy computation is done in C and I just want the input / output of it to be something that the user can then make sense of in the rest of their OCaml program, via Zarith.

bobot commented 2 years ago

Thank you for the discussion and the code example. Still it seems that having a Obj.t typ available as argument and return value would simplify a lot by avoiding to write any stub in C.