CAPI that is as low-overhead as possible

wrtobin commented 6 years ago

So having a CAPI and a Fortran interface would be good, through not 100% required for this codebase.

However since my intent is to replace the MSI linear algebra capabilities with las, the most optimal method to use MSI from a C/Fortran environment is something that needs to be considered.

I don't believe there is a way to inline a CAPI call when the underlying function is C++, as that would mean inserting C++ code into a C context. This takes place at compile time, not link-time so the language matters.

I think the most efficient implementation would therefore be a single function call of overhead for each API function (for the call from FORTRAN/C to the function under the CAPI which will be C++). Anything we can do with LTO to get these functions to 'inline' during linkage would provide the thinnest API.

wrtobin commented 6 years ago

So the API is zero-overhead by virtue of the fact that the functions are inlined at compile-time using CRTP and mandatory backend inlining directives. When a project using C++ uses this library everything is hunky-dory and the functions can be properly inlined and compiled in the downstream project (though more test cases are needed in las to evaluate the correctness of the implementation).

When a project wants to access the LAS abstraction layer through C-compatible interface, that presents a problem, since the method we use to get zero-overhead API calls is inherently dependent on C++ features. Since the CAPI must be callable by C (and FORTRAN), the downstream project cannot be responsible for compiling an inline, header-only function, since that would require cross-language compilation. Thus a CAPI requires (at minimum, ignoring LTO) the overhead of a single function call for each function in the API. However we still want to get compile-time polymorphism in deciding the backend.

The solution is to compile multiple CAPI libraries based on the different backends, and determine which to include in the downstream project at configuration time (slightly ahead of compile time, but essentially analogous for our purposes). However, if two LAS CAPI libraries needed to be used in the same project, this would cause a symbol conflict since the two libraries would expose the same interface.

The way to get around that is to change the interface for each CAPI library based on the backend. However this changes the interface (necessarily), and we want to be able to code to a common API. To accomplish this is C requires preprocessor macros, so the goal will be to call las functions like:

las(zero_matrix)(mat)

Where las() is a macro and zero_matrix is the function being called. The macro will expand to las_zero_matrix_petscOps in source files including the PETSc backend headers. This allows a single downstream project to compile against multiple LAS backends each exposing the "same" interface and moves the decision of which backend to use from configuration time back to compile-time.

However the user in this case must not include multiple LAS backend libraries in the same compilation unit, as the preprocessor macros would clash.

There are ways to deal with this, probably by using UNDEF to remove the backend-specifying macro and redefine it, but that seems clumsy. A slightly more elegant solution may be possible but requires some thinking. Of course explicitly calling the desired backend directly is possible, but locks the user into a specific backend, and would require that the LAS CAPI primary library header not provide the preprocessor macro, but to provide it in some tertiary file. This may be fine since a user should rarely (hopefully) have to use the CAPI with multiple backends in the same compilation unit.

wrtobin commented 6 years ago

Implemented in 15c5abb0983a882d9fcfc4227265c61efedae4e4

Having multiple backends in the same compilation unit isn't possible (neatly) yet. I prefer to work around the issue for the moment. If it becomes a hassle I can revisit this.

Still need to build with the cuda backend enabled, but lack of a reliable linux-based cuda dev environment is holding that back at the moment.

Also it would be technically possible to get a zero-overhead CAPI. But to do so would require some substantial restructuring since the zero-overhead api is currently only possible to compile using C++. If we were to (1) guarantee the LAS backends would be C code and (2) restructure the codebase based around the current CAPI model, making that the primary interface and providing the C++ CRTP+inline interface as a secondary interface. This would take a few days (3-5 most likely) of dev time, so doesn't seem much worth it since we couldn't get the zero-overhead call in FORTRAN anyway (though with LTO maybe some of these will get 'inlined' at link-time).

wrtobin / las

CAPI that is as low-overhead as possible #2