Closed disteph closed 3 years ago
I expect this can be built on top of Ctypes. Since Z.t
values can be are either OCaml integers or custom blocks corresponding to gmp
values it'd be necessary to handle both cases.
I tried to provide a bit more information to @disteph about how to do this, but I realized that this is rather difficult -- I don't know how to do it. (I'm unfamiliar with Ctypes.)
Context:
mpz_t
) directly, and where on the OCaml side we manipulate them as if they were using Zarith values (Zarith.t
).Zarith.t
) from another ctype (here for mpz_t
) and conversion functions back and forth.Zarith.t
value is represented as an OCaml value (the value
type in C), either an immediate integer, a boxed integer, or a custom block containing a mpz_t
payload. In particular, because Zarith.t
are custom blocks in the general case, they cannot be created directly from OCaml, the conversion functions to and from mpz_t
in Zarith are only in C bindings.A first idea (which I believe does not work) is to first bind the mpz
/Zarith.t
conversion functions into OCaml as externals, and then use them in the view
function. Sketch:
(* this type represents mpz_t values from the OCaml side *)
type mpz_t = unit ptr (* inspired from examples/ncurses *)
(* C-type for mpz_t *)
let mpz_t : mpz_t typ = ptr void (* inspired from examples/ncurses *)
(* bindings to conversion functions implemented in C (zarith.h or other glue) *)
external zarith_of_mpz : mpz_t -> Zarith.t = "ml_z_from_mpz"
external mpz_of_zarith : Zarith.t -> mpz_t = "?"
(* C-type for Zarith.t *)
let zarith = view ~read:zarith_of_mpz ~write:mpz_of_zarith
However, I believe that this approach is incorrect: unit ptr
is the type of Ctypes representations of C values, not of naked C values (which are strongly discouraged in the OCaml runtime nowadays anyway).
I think that this is a fundamental issue with Ctypes+Zarith: the conversion functions between mpz_t
and Zarith.t
cannot be used as OCaml externals, as the mpz_t
type cannot be safely represented in OCaml. I do not know how to use Ctypes in this situation. Ideally I would want to have a sort of view
where the conversion functions are not used on the OCaml side, but on the C side.
I was trying the first way, hoping that the conversion from Ctypes representations of C values to naked C values could be simply done in C with Data_custom_val
, yielding the following wrapper around zarith.h
, to be used when you say "zarith.h or other glue", namely:
#include <stdlib.h>
#include <stdint.h>
#include <gmp.h>
#include <zarith.h>
#include <caml/mlvalues.h>
#include <caml/memory.h>
#include <caml/alloc.h>
/* sets rop to the value in op (limbs are copied) */
CAMLprim value ml_z_mpz_set_z_ml(value rop, value op) {
CAMLparam2(rop, op);
mpz_ptr z = (mpz_ptr) (Data_custom_val(rop));
ml_z_mpz_set_z(z, op);
CAMLreturn(Val_unit);
}
/* inits and sets rop to the value in op (limbs are copied) */
CAMLprim value ml_z_mpz_init_set_z_ml(value rop, value op) {
CAMLparam2(rop, op);
mpz_ptr z = (mpz_ptr) (Data_custom_val(rop));
ml_z_mpz_init_set_z(z, op);
CAMLreturn(Val_unit);
}
/* returns a new z objects equal to op (limbs are copied) */
CAMLprim value ml_z_from_mpz_ml(value rop) {
CAMLparam1(rop);
mpz_ptr z = (mpz_ptr) (Data_custom_val(rop));
CAMLreturn(ml_z_from_mpz(z));
}
(using mpz_ptr
--a pointer to the mpz struct-- rather than mpz_t
--a size 1 array of the mpz struct-- as I can easily convert between the two in C or in Ctypes).
But I feel I'm completely out of my depth here.
The wrapping of zarith.c attempted above was inspired by Zarith's implementation of conversion functions to/from MLGMPIDL, an Ocaml wrapper of GMP, and I was hoping that Ctypes's representation of an mpz_ptr
value is the same as MLGMPIDL's, i.e. Data_custom_val really gives the expected C mpz_ptr value.
But then I'm still stuck with mundane build/linking problems. And perhaps my assumption above is incorrect, or perhaps Ctypes people would approach the problem in a different way :-) I'd be happy to hear.
@disteph one problem I see with writing your own conversion functions on the C side is that it is not clear, to me, how to get Ctypes to automatically use them when binding third-party C functions to the OCaml world. If you have to write your own C-side wrapper of each third-party function, and then connect this to the OCaml world to ctypes, you lose a part of the value of the tool. (But maybe I misunderstood your approach.)
Another approach I considered recently is: instead of trying to use external
to get the conversion functions into the OCaml world, we can use Ctypes's own support for external functions:
type mpz_t = unit ptr
let mpz_t : mpz_t typ = ptr void
let zarith_of_mpz : mpz_t -> Zarith.t = Cstubs.foreign "ml_z_from_mpz" (mpz_t @-> returning ??)
let mpz_of_zarith : Zarith.t -> mpz_t = Cstubes.foreign "??" (?? @-> returning mpz_t)
let zarith : Zarith.t typ = view mpz_t ~read:zarith_of_mpz ~write:mpz_of_zarith
This solves the problem on the mpz_t
(or mpz_ptr
indeed) side, as now we properly manipulate them, but it creates a problem on the Zarith side: what is the ctype we should use to represent the result of ml_z_from_mpz
? It cannot be val zarith : Zarith.t typ
which is not defined yet -- and performs a conversion implicitly. It should be a type of raw OCaml values, as seen from the C side (so some form of primitive val value : Obj.t typ
. But I could not find this in the Ctypes interface -- although something very similar seems used internally to define primitive types.
Well indeed, using Ctypes to bind third-party functions manipulating mpz_t
/ mpz_ptr
would not make ocaml functions using Zarith's Z.t
but rather functions manipulating the Ctypes representation of mpz_t
/ mpz_ptr
. But if I have Ocaml functions converting Z.t
to/from the Ctypes representation of mpz_t
/ mpz_ptr
(binding my 3 C functions above via external
), I can compose them in the Ocaml world, so that I don't have to write my own C-side wrapper of each third-party function (I would wrap them in Ocaml). In this approach Ctypes never has to know about Zarith. This was what I was trying to do.
But I agree it'd be much better to have Ctypes representations of Zarith types, with a handler val zarith_z : Z.t typ
and have Ctypes automatically lift the third-party functions manipulating C mpz_t
/ mpz_ptr
into OCaml functions manipulating Z.t, the same way it lifts third-party functions manipulating C long
into OCaml functions manipulating Signed.Long.t
using handler long : Signed.Long.t typ
. But in order to do that I assume you have to deal with the internals of Ctypes and release a new version with a zarith dependency, rather than build it on top of Ctypes with its current API. Unless I misunderstand something.
Ctypes can't deal with custom blocks created by hand-written third party libraries. But you can manually create two stub functions that convert such blocks to something that is understood by Ctypes - for example "raw pointers".
/* from_zarith writes a Z.t value (first parameter) to the location
pointed to by the second parameter. nativeints are used to
store pointers. */
CAMLprim value from_zarith(value zt, value np)
{
CAMLparam2(zt,np);
/* study zariths source code for details */
mpz_t mp = zarith_representation_to_plain_c(zt);
*((mpz_t *)Nativeint_val(np)) = mp;
CAMLreturn(Val_unit);
}
/* reverse: *mpz_t to Z.t */
CAMLprim value to_zarith(value v)
{
CAMLparam1(v);
CAMLlocal1(res);
mpz_t m = *((mpz_t *) Nativeint_val(v));
/* study zariths source code for details again */
res = convert_to_zarith_representation(m);
CAMLreturn(res);
}
Now you can define the type with Ctypes and connect both (memory management and other details ignored)
type mpz_struct
let mpz_struct : mpz_struct Ctypes.structure Ctypes.typ =
Ctypes.typedef (Ctypes.structure "") "__mpz_struct"
let _x = Ctypes.field mpz_struct "_mp_foo" Ctypes.int
(* more fields *)
let () = Ctypes.seal mpz_struct
external from_zarith : Z.t -> nativeint -> unit = "from_zarith"
let from_zarith x =
let res = Ctypes.allocate_n mpz_struct ~count:1 in
Ctypes.to_voidp res |> Ctypes.raw_address_of_ptr |> from_zarith x;
res
external to_zarith : nativeint -> Z.t = "to_zarith"
let to_zarith (x:mpz_struct Ctypes.structure Ctypes.ptr) =
Ctypes.to_voidp x |> Ctypes.raw_address_of_ptr |> to_zarith
(* simplification. mpz_t only decays to a pointer, when used
as function parameter ... *)
let mpz_t_fparam : Z.t Ctypes.typ =
Ctypes.view
~format_typ:(fun k fmt -> Format.fprintf fmt "mpz_ptr%t" k)
~read:to_zarith
~write:from_zarith
(Ctypes.ptr mpz_struct)
let mpz_t_frparam = Ctypes.typedef (Ctypes.ptr mpz_struct) "mpz_ptr"
let add =
Foreign.foreign "mpz_add"
(mpz_t_frparam @-> mpz_t_fparam @-> mpz_t_fparam @-> returning void)
let add : Z.t -> Z.t -> Z.t =
fun a b ->
let res = Ctypes.allocate_n mpz_struct ~count:1 in
add res a b;
to_zarith res
It's of course rather inefficient because of all the allocations and switches between C and the OCaml runtime that would not be necessary, if you write your stubs manually....
@fdopen Thanks for the suggestion; I had thought about something along those lines but I discarded it as "not the right way to do it", but thinking about it more it may at least provide a working (if unpleasant) solution to what I was trying to do.
I don't think that we need extra allocations to use the idea of wrapping pointers as native integers. We should be able to use Ctypes' own support for this: we can move from a mpz_ptr typ
to a nativeint typ
by using view ~read:raw_address_of_ptr ~write:ptr_of_raw_address
, and then we can use view
again to go from a nativeint typ
to a Zarith.t
typ, by using external
functions that unwrap the nativeint before calling the conversion functions from zarith.h
.
This gives the following sketch:
(* C-type for mpz_ptr *)
type mpz_ptr = unit ptr
let mpz_ptr : mpz_ptr typ = ptr void (* inspired from examples/ncurses *)
(* mpz_ptr wrapped as a nativeint *)
let mpz_nativeint = view mpz_ptr ~read:raw_address_of_ptr ~write:ptr_of_raw_address
(* bindings to conversion functions implemented in C,
unwrapping the nativeint before calling zarith.h conversions *)
external zarith_of_mpz_nativeint : nativeint -> Zarith.t = "ml_z_from_mpz_nativeint"
external mpz_nativeint_of_zarith : Zarith.t -> nativeint = "?"
(* C-type for Zarith.t *)
let zarith = view mpz_nativeint ~read:zarith_of_mpz_nativeint ~write:mpz_nativeint_of_zarith
I am not completely sure that this use of nativeint_of_raw_ptr
is valid (with this unsafe function there is a risk of losing the last reference to the input and then having it garbage-collected). The raw-pointer is pointing to a C value that is not managed by the OCaml GC, so this is good, but I have the impression that unit ptr
values contain a wrapper on the OCaml side, and I don't know how to reason about liveness of this value during view
conversions.
(Note: in practice the API of zarith.h
suggests that you don't want to create mpz_ptr
values from a Zarith.t
value (there is no such function provided, if I understand correctly), but rather write a Zarith.t
value to an out-variable pointer passed in argument to the C function. I am glossing over this difference here, I hope this can be solved with the Ctypes machinery after the fundamental how-to-convert question is resolved.)
I don't think that we need extra allocations to use the idea of wrapping pointers as native integers.
It can't be avoided for the reason you've already mentioned: the garbage collector. Just look at the mentioned mpz_add function whose prototype boils down to something like this (with all constness and obfuscation through typedefs removed):
void mpz_add (*__mpz_struct,*__mpz_struct,*__mpz_struct);
The function takes pointers to structures. And it's just not possible to convert Zarith.t values to a usable pointer on the fly. A Z.t value might be a plain integer, passed by value, without address. And if it is a custom block, you can't point to anything inside it, because it could be shuffled around by the GC. Another parameter of the function could be created through a Ctypes.view whose functions trigger the GC. So the pointer you’ve created with your view could already be invalid once it reaches the c function.
Therefore the view must create deep copies. Z.t
values are completely stored inside the OCaml heap, whereas mpz_struct Ctypes.structure Ctypes.ptr
points to memory inside the C heap.
(Note: in practice the API of zarith.h suggests that you don't want to create mpz_ptr values from a Zarith.t value (there is no such function provided, if I understand correctly)
I didn't know this header. But now I've read it and seems to provide anything necessary to implement the functions suggested by me:
CAMLprim value from_zarith(value zt, value np)
{
CAMLparam2(zt,np);
mpz_ptr ptr = (mpz_ptr)Nativeint_val(np);
ml_z_mpz_init_set_z(ptr, zt);
CAMLreturn(Val_unit);
}
CAMLprim value to_zarith(value v)
{
CAMLparam1(v);
CAMLlocal1(res);
mpz_ptr ptr = (mpz_ptr)Nativeint_val(v);
res = ml_z_from_mpz(ptr);
CAMLreturn(res);
}
@fdopen I don't disagree but I think we are talking past each other. What I tried to do above is in fact to find a generic solution for the following problem: binding C functions that use a C type foo
, but exposing them to OCaml with an OCaml type bar
, provided that I have been given C implementations of conversions between foo
and bar
.
This sounds like a fairly common pattern to me, and I would have expected Ctypes to support it easily. As my hesitations show, this is not the case, at least for someone unexperienced with Ctypes. I currently think that the best way to do it, at least when foo
is a pointer type, is to:
Ctype.view
to convert foo
into nativeint
(note: this part does not need any extra allocation/boxing)foo
pointers and bar
, using the C library functions between foo
and bar
Ctype.view
to convert from nativeint
to bar
I am not sure if there would be a better way to do this today with Ctypes (an example of this in examples/
would already help), or whether the API could be extended to support this. (For example: I wondered whether it could be done more easily with a view_foreign
function that does not expect ('a -> 'b)
conversions in OCaml-land, but ('a -> 'b) fn
conversions bound from C-land).
Re. deep vs. shallow copies: thanks for insisting on the fact that the mpz
payload in Zarith custom blocks is inline (not a pointer outside the heap, which would have been my initial expectation). I think that it may still be possible to imagine a zero-copy binding done through Ctypes, if there is support for declaring that certain parameters of a bound function should be registered as roots for duration of the call. (I saw some root-handling logic when looking at the Ctype internals, but didn't look in depth.)
However, this is not necessary for a first approach to mpz/Zarith binding, the zarith.c functions (which do perform a copy) are the easiest way to proceed. In any case I don't think that copying the bigints is a performance bottleneck for @disteph's applications -- hopefully the bigint-taking functions are long enough to run that this is amortized.
binding C functions that use a C type foo, but exposing them to OCaml with an OCaml type bar, provided that I have been given C implementations of conversions between foo and bar.
Ctypes needs to know the internals on any C type: size, etc. These information are necessary for memory management and for code generation (either source code or dynamic through libffi). Therefore the definiton of the C type must be repeated through Ctypes primitives. The connection to your own type is then covered by Ctypes.view.
If one also has to write C code, my syntax extension can be helpful. It allows to switch easily between C and OCaml.
I've uploaded example code: Line 6 defines the primary Ctypes.typ and line Line 15 and Line 28 demonstrate how to convert an OCaml type at the C level to something that was described earlier as a Ctypes.typ.
use a Ctype.view to convert foo into nativeint (note: this part does not need any extra allocation/boxing)
Two allocations in C in this case. The first for the structure that is passed to ml_z_mpz_set_z
that then triggers a second allocation (for the "limb").
For example: I wondered whether it could be done more easily with a view_foreign function that does not expect ('a -> 'b) conversions in OCaml-land, but ('a -> 'b) fn conversions bound from C-land
What about error handling? You've transformed the first two parameters, but it doesn't work for the third one: Out of memory. The useless allocated ressources for the previous params must be cleaned up, before you can return an error value. How to encode the necessary logic from OCaml if transformations are only and somehow implicitly done in C-land?
Thanks for the in-depth discussion and the proposed code at https://github.com/fdopen/ctypes-zarith. Yes, the ppx_cstubs is neat!
Indeed, I realise that there is a copy within ml_z_mpz_set_z
and I'm not worried about that being a bottleneck in my application: the third-party functions manipulating mpz that I want to bind are not functions called over and over; I'm not trying to bind .
void mpz_add (*__mpz_struct,*__mpz_struct,*__mpz_struct);
but some API functions of our SMT-solver Yices; the heavy computation is done in C and I just want the input / output of it to be something that the user can then make sense of in the rest of their OCaml program, via Zarith.
Thank you for the discussion and the code example. Still it seems that having a Obj.t typ
available as argument and return value would simplify a lot by avoiding to write any stub in C.
Feature request: Would it be possible to have ctypes bridge gmp types with zarith types or mlgmpidl types? I.e., in the same way that
let ml_function = Foreign.foreign "c_function" Ctypes.(long @-> ...)
wraps a C function taking an argument of (C) typelong
as an ML function of typeSigned.long ->...
,let ml_function = Foreign.foreign "c_function" Ctypes.(mpz @-> ...)
would wrap a C function taking an argument of (C) typempz_t
as an ML function of typeZ.t -> ...
for a type handlerval mpz : Z.t Ctypes.typ
? Is this something I can do myself as syntactic sugar on top of what exists? My trouble was that the types Z.t/Q.t in Zarith or their counter-part in mlgmpidl are abstract and I don't see how I can represent them in the Ctypes world...