Open jemc opened 7 years ago
We discussed this on the sync call.
The best approach we could think of is to declare a normal struct
with normal fields, but use AST annotations (#64) to link the Pony struct to a specific C struct, and to link each Pony field to a specific C field.
The struct would be treated as normal throughout the compiler and the type system, and the annotations would only matter when reaching the LLVM code generation pass, which would use libclang
to read the appropriate C header and influence the memory layout of the Pony struct to match the C struct.
That's a great start to an RFC, we just need someone to draw up a detailed plan that hashes out the details.
There was some discussion in https://github.com/ponylang/ponyc/issues/1552 about the prospect of fully-automating the process of FFI-wrapping C libraries.
Specifically, @agarman said:
Best possible FFI solution would be direct interop with C header files. There's a lot of boilerplate code written to wrap a C library. It's work that can and should be automated as doing it manually is error prone.
If I correctly understand what's being proposed, than I strongly disagree with this sentiment. We discussed these ideas in detail on the sync call from this date (you can listen to the audio here: https://pony.groups.io/g/dev/files/Pony%20Sync/January%2018,%202017), but I will try to summarize some of my main points below to faciliate easier discussion.
For background, I've wrapped a lot of different FFI libraries, in several different languages (Python, Ruby, Pony), and I've even led a team that successfully created an open-source solution that does fully-automate the FFI-wrapping process (see https://github.com/zeromq/zproject), and through those experiences, I have come to believe in the following claim:
In general (that is, for the set of useful C libraries in the world that people might want to use via FFI), from the code found in C headers alone, it is not possible to derive a correct and useful FFI wrapper in an object-oriented language. There is simply not enough information in a C header to tell us everything we need to know about the functions in the library, to map to the concepts that are required for correct and useful usage of the library. Much of this information is in documentation, or implicit convention, (or for a bad library which uses neither, left up to guessing), neither of which can be parsed by an automated process. Problems compound when the host language uses garbage collection, or has features that other languages don't normally express (like reference capabilities in Pony).
Examples of information that is missing from the header:
free
? Is there a special pool allocator free
function associated with this library? Is there a special destroy or decrement function in this library associated with this particular type of object?my_struct_t*
, is it meant to accept the address of a local struct value to be filled (like "another return value"), or is it meant to accept an already-filled struct to do something with (like a "true argument")? Something else?my_struct_t**
, is it meant to accept the address of a pointer to a local struct, for the purposes of making it point to another struct pointer? Or maybe to set it to NULL, so that the reference is invalidated (the zproject libraries use this style)? Or maybe it's meant to accept a list of pointers to structs? If a list, how is it terminated, by a size argument, or does it need a NULL
item added to the end, or maybe it is a fixed-size list?NULL
return value to a function that was supposed to return a struct pointer? Setting the error value is some global "context" object? Errno? Calling pony_throw
?As an FFI-wrapper developer, you can come up with answers to these questions for the library/objects that you're working with. It's not always an easy process, but you're the only one who can find the answers.
I agree there is a lot of boilerplate in FFI-wrapping, but this boilerplate is not lifeless or universal - the type of boilerplate you use is encoding the answers to these questions. You can't make a universal boilerplate because the answers are not universal for the general case - you can only do it by holding to specific assumptions about the type of library being wrapped, which in turn limit which libraries your boilerplate works with.
If you're wondering about my earlier mention of a project where we successfully automated the entire process of wrapping a C library, this is exactly what we did - we made assumptions that all libraries being wrapped would use the CZMQ "CLASS" style, and moreover, we required that all libraries provide an XML API description, which we used to generate the FFI wrappers, and the C header itself. The XML API description was designed to contain answers to all the questions we needed to know about within the confines of the CZMQ "CLASS" style of library, and with that information we finally had all we needed to automate the FFI-wrapping process.
from C to an object-oriented language simply doesn't work, especially for a garbage collected one, unless you constrain to only handle - the C header does .
In light of that diagnosis, I suggest that we stick to methods of assisting with FFI-wrapping. "Partial automation" instead of "full automation" if you will.
An ideal FFI-wrapping system would give you ways to answer those questions about all objects/functions involved, and do the rest of the work for you. We can probably get partway to that ideal, at least. But we need to keep the complexity and profound differences among the set of all C useful libraries we want todwrap - they are not all designed in the same way - not by a long shot.
@jemc by direct interop with C headers, I meant only that structs wouldn't have to be re-typed in pony & that pony would provide compiler checks for unambiguously incorrect @ invocations.
Any details & semantics associated with usage of a C library are left to the user of that library to get correct.
We discussed on a recent sync call the idea of being able to load struct definitions from a C header, so that Pony code could potentially depend on platform dependent struct definitions.
This came up in discussion of https://github.com/ponylang/ponyc/issues/1513, in which an openssl dependency was added to the pony runtime in order to put
ponyint
functions that use theSSL_CTX
type there. We discussed that it would be better if we could avoid that by writing those accessors in Pony. But to do that, we'd need Pony to be able to load the struct defs from a header when compiling.This would probably require a
libclang
dependency forponyc
to be able to read C header files.This idea needs more discussion to flesh out the details and any feasibility issues.