avoid running a full GC cycle on every FFI closure allocation

This PR makes make_function_pointer faster by avoiding a full GC cycle on every invocation.

The documentation of caml_alloc_custom (section 9.2 of the manual) says:

Another way to describe the effect of the used and max parameters is in terms of full GC cycles. If you allocate many custom blocks with used / max = 1 / N, the GC will then do one full cycle every N allocations.

So for used=1, max=1 we make a full GC cycle on every call (or within a small constant factor of that). This is clearly excessive and can be very slow.

I think caml_alloc_custom_mem can be used to specify the object size more precisely, but I don't know enough about the code to make that change. Additionally, I'm expecting 0-size to be at most a constant-factor error, anyway, because the custom block itself still counts toward the heap usage, so allocating these in a loop will eventually run a GC.

We at Jane Street have used the patched version of the code for a long time and it works well, so the code is tested by production use.

cc @tiash in case you'd like to add any background on what motivated the change (e.g. how bad the slowness was)

yallop / ocaml-ctypes

avoid running a full GC cycle on every FFI closure allocation #694