ponylang / rfcs

RFCs for changes to Pony
https://ponylang.io/
59 stars 48 forks source link

Load struct definitions from C header. #75

Open jemc opened 7 years ago

jemc commented 7 years ago

We discussed on a recent sync call the idea of being able to load struct definitions from a C header, so that Pony code could potentially depend on platform dependent struct definitions.

This came up in discussion of https://github.com/ponylang/ponyc/issues/1513, in which an openssl dependency was added to the pony runtime in order to put ponyint functions that use the SSL_CTX type there. We discussed that it would be better if we could avoid that by writing those accessors in Pony. But to do that, we'd need Pony to be able to load the struct defs from a header when compiling.

This would probably require a libclang dependency for ponyc to be able to read C header files.

This idea needs more discussion to flesh out the details and any feasibility issues.

jemc commented 7 years ago

We discussed this on the sync call.

The best approach we could think of is to declare a normal struct with normal fields, but use AST annotations (#64) to link the Pony struct to a specific C struct, and to link each Pony field to a specific C field.

The struct would be treated as normal throughout the compiler and the type system, and the annotations would only matter when reaching the LLVM code generation pass, which would use libclang to read the appropriate C header and influence the memory layout of the Pony struct to match the C struct.

That's a great start to an RFC, we just need someone to draw up a detailed plan that hashes out the details.

jemc commented 7 years ago

There was some discussion in https://github.com/ponylang/ponyc/issues/1552 about the prospect of fully-automating the process of FFI-wrapping C libraries.

Specifically, @agarman said:

Best possible FFI solution would be direct interop with C header files. There's a lot of boilerplate code written to wrap a C library. It's work that can and should be automated as doing it manually is error prone.

If I correctly understand what's being proposed, than I strongly disagree with this sentiment. We discussed these ideas in detail on the sync call from this date (you can listen to the audio here: https://pony.groups.io/g/dev/files/Pony%20Sync/January%2018,%202017), but I will try to summarize some of my main points below to faciliate easier discussion.

For background, I've wrapped a lot of different FFI libraries, in several different languages (Python, Ruby, Pony), and I've even led a team that successfully created an open-source solution that does fully-automate the FFI-wrapping process (see https://github.com/zeromq/zproject), and through those experiences, I have come to believe in the following claim:

In general (that is, for the set of useful C libraries in the world that people might want to use via FFI), from the code found in C headers alone, it is not possible to derive a correct and useful FFI wrapper in an object-oriented language. There is simply not enough information in a C header to tell us everything we need to know about the functions in the library, to map to the concepts that are required for correct and useful usage of the library. Much of this information is in documentation, or implicit convention, (or for a bad library which uses neither, left up to guessing), neither of which can be parsed by an automated process. Problems compound when the host language uses garbage collection, or has features that other languages don't normally express (like reference capabilities in Pony).

Examples of information that is missing from the header:

As an FFI-wrapper developer, you can come up with answers to these questions for the library/objects that you're working with. It's not always an easy process, but you're the only one who can find the answers.

I agree there is a lot of boilerplate in FFI-wrapping, but this boilerplate is not lifeless or universal - the type of boilerplate you use is encoding the answers to these questions. You can't make a universal boilerplate because the answers are not universal for the general case - you can only do it by holding to specific assumptions about the type of library being wrapped, which in turn limit which libraries your boilerplate works with.

If you're wondering about my earlier mention of a project where we successfully automated the entire process of wrapping a C library, this is exactly what we did - we made assumptions that all libraries being wrapped would use the CZMQ "CLASS" style, and moreover, we required that all libraries provide an XML API description, which we used to generate the FFI wrappers, and the C header itself. The XML API description was designed to contain answers to all the questions we needed to know about within the confines of the CZMQ "CLASS" style of library, and with that information we finally had all we needed to automate the FFI-wrapping process.

from C to an object-oriented language simply doesn't work, especially for a garbage collected one, unless you constrain to only handle - the C header does .

jemc commented 7 years ago

In light of that diagnosis, I suggest that we stick to methods of assisting with FFI-wrapping. "Partial automation" instead of "full automation" if you will.

An ideal FFI-wrapping system would give you ways to answer those questions about all objects/functions involved, and do the rest of the work for you. We can probably get partway to that ideal, at least. But we need to keep the complexity and profound differences among the set of all C useful libraries we want todwrap - they are not all designed in the same way - not by a long shot.

agarman commented 7 years ago

@jemc by direct interop with C headers, I meant only that structs wouldn't have to be re-typed in pony & that pony would provide compiler checks for unambiguously incorrect @ invocations.

Any details & semantics associated with usage of a C library are left to the user of that library to get correct.