python-cffi / cffi

A Foreign Function Interface package for calling C libraries from Python.
https://cffi.readthedocs.io/en/latest/
Other
114 stars 41 forks source link

Getting function types without loading the corresponding library #70

Closed MasonMcGill closed 5 months ago

MasonMcGill commented 5 months ago

Hi, Is there a way to get the type of a function declared in an FFI.cdef call without compiling and loading the implementation?

I'm looking into using cffi to generate ctypes bindings for a shared library based on its header file, and I want to avoid having to load the library once with cffi and then again with ctypes. (I'm opting to use ctypes over cffi because ctypes structure types can be used as data types for NumPy arrays.)

ffi = cffi.FFI()
ffi.cdef(c_header_text)
lib = ffi.dlopen(None)

ffi.list_types() # Requesting type information works
dir(lib) # Listing macro and function names works
lib.SOME_MACRO # Requesting macro values works
lib.some_function # Requesting functions results in an undefined symbol error
arigo commented 5 months ago

Hi Mason,

On Fri, 12 Apr 2024 at 00:53, MasonMcGill @.***> wrote:

Hi, Is there a way to get the type of a function declared in an FFI.cdef call without compiling and loading the implementation?

I'm looking into using cffi to generate ctypes bindings for a shared library based on its header file, and I want to avoid having to load the library once with cffi and then again with ctypes. (I'm opting to use ctypes over cffi because ctypes structure types can be used as data types for NumPy arrays.)

ffi = cffi.FFI()ffi.cdef(c_header_text)lib = ffi.dlopen(None) ffi.list_types() # Requesting type information worksdir(lib) # Listing macro and function names workslib.SOME_MACRO # Requesting macro values workslib.some_function # Requesting functions results in an undefined symbol error

This is a rather niche usage for cffi, but... The real question is: is it a real performance issue to have to do all that at load time?

  1. Loading the same library twice instead of once isn't really going to make much of a difference, given the general overhead of doing all the other stuff. The library isn't actually loaded twice anyway: you just get two pointers to the same library loaded by the C-level dlopen() logic (or LoadLibrary() on Windows), once wrapped as a ctypes object, the other time as a cffi object.

  2. If you have a real performance issue, then you need another solution altogether. You could write a separate script that uses cffi like you did above, including loading the correct lib to get at the functions, and emit ctypes data in some format---maybe as generated code, like a .py file containing ctypes declarations; maybe as some pickle if ctypes objects can be pickled (I don't know); or else in some custom format. You run this separate script once and produce the code/data. Then in the actual "production" code, load that code/data without using cffi at all.

(Or, 3.: I don't really know numpy, but maybe there is a ctypes-free approach to your original numpy problem. Maybe others have comments about that.)

A bientôt,

Armin Rigo

Message ID: @.***>

MasonMcGill commented 5 months ago

Thanks for the response! That's good to know the library won't be loaded twice.

I'm not particularly worried about performance, but if there was a way to use cffi such that the part of the code generating the bindings didn't need to be passed all the details of how to compile the library, that would still be nice to know. It's maybe not the highest-stakes issue, but I like the conceptual clarity of "header text in -> wrapper objects out", and if I spin the little binding generation module I wrote this afternoon out into a script, it would be nice if I could auto-run it whenever the header file is edited, without having to recompile the library (which can take a little while).

arigo commented 5 months ago

There is no built-in way to do that, but you can write a Makefile or integrate the "run the binding generation script" step into some other build system you might already have, or even write some hack in your main program that first checks if some header files were modified, automatically execute the binding generation script if so, and only then loads the generated code/data.