rust-lang / rust-bindgen

Automatically generates Rust FFI bindings to C (and some C++) libraries.
https://rust-lang.github.io/rust-bindgen/
BSD 3-Clause "New" or "Revised" License
4.4k stars 691 forks source link

Functional C macros are not expanded. #753

Open nicokoch opened 7 years ago

nicokoch commented 7 years ago

So I'm generating bindings to <linux/uinput>. This works pretty well, but two c macros in particular do not produce any bindings in the output. Reduced input below:

Input C/C++ Header

#define UI_DEV_CREATE       _IO(UINPUT_IOCTL_BASE, 1)
#define UI_DEV_DESTROY      _IO(UINPUT_IOCTL_BASE, 2)

Bindgen Invocation

let bindings = bindgen::Builder::default()
        .no_unstable_rust()
        .header("wrapper/wrapper.h")
        .expect("Unable to generate bindings").

wrapper.h looks like this:

#include <linux/uinput.h>

Actual Results

No constant UI_DEV_CREATE or UI_DEV_DESTROY generated in output.

Expected Results

Constants UI_DEV_CREATE and UI_DEV_DESTROY generated in output.

I'm not really that familiar with the kernel macro _IO and not sure if this is technically even possible, but I figured to report just to get a more professional opinion. If it's not possible, workarounds welcome.

emilio commented 7 years ago

How is _IO defined? Grepping define _IO in my /usr/include/linux dir didn't yield anything.

emilio commented 7 years ago

As in... If we don't know how _IO is defined, how could we generate a sensible constant?

emilio commented 7 years ago

That being said, even with that it doesn't work (huh, I'd swear we knew how to deal with macro expansions):

#define _IO(a_) (a_)

#define UI_DEV_CREATE _IO(1)
#define UI_DEV_DESTROY _IO(2)
emilio commented 7 years ago

This is probably a bug for rust-cexpr. Indeed, it's a dupe of https://github.com/jethrogb/rust-cexpr/issues/3.

emilio commented 7 years ago

I guess I can try to add to libclang a way to get whether the macro definition belongs to a functional macro...

nicokoch commented 7 years ago

If it helps any: _IO is defined in uapi/asm-generic/ioctl.h as: #define _IO(type,nr) _IOC(_IOC_NONE,(type),(nr),0) and _IOC in turn is defined as:

#define _IOC(dir,type,nr,size) \
      (((dir)  << _IOC_DIRSHIFT) | \
      ((type) << _IOC_TYPESHIFT) | \
      ((nr)   << _IOC_NRSHIFT) | \
      ((size) << _IOC_SIZESHIFT))
emilio commented 7 years ago

Ok, so libclang provides a clang_Cursor_isMacroFunctionLike API. Perhaps we could use that.

emilio commented 7 years ago

I've asked @jethrogb about what the best API for this would be. It shouldn't be a big deal to support it in bindgen once cexpr support is there.

nicokoch commented 7 years ago

Meanwhile, I used the following in my wrapper.h to workaround the issue:

#include <linux/uinput.h>

const __u64 _UI_DEV_CREATE = UI_DEV_CREATE;
#undef UI_DEV_CREATE
const __u64 UI_DEV_CREATE = _UI_DEV_CREATE;

const __u64 _UI_DEV_DESTROY = UI_DEV_DESTROY;
#undef UI_DEV_DESTROY
const __u64 UI_DEV_DESTROY = _UI_DEV_DESTROY;
emilio commented 7 years ago

Yeah, note that that would only work in libclang 3.9+ though.

jethrogb commented 7 years ago

Current state: C preprocessor needed. @emilio can you add "help wanted" label?

clia commented 5 years ago

How about the progress of this? I'm generating the PostgreSQL header files using bindgen, but in the output files there're a lot of macros being lost.

jethrogb commented 5 years ago

@clia The state is unchanged as of my latest comment https://github.com/rust-lang-nursery/rust-bindgen/issues/753#issuecomment-318214311. This issue is marked "help wanted" and waiting for someone (anyone) to implement this functionality.

iDawer commented 5 years ago

Another workaround preserving original names:

wrapper.h
#include <linux/ioctl.h>
#include <linux/usbdevice_fs.h>

#define MARK_FIX_753(req_name) const unsigned long int Fix753_##req_name = req_name;

MARK_FIX_753(USBDEVFS_CONTROL);
MARK_FIX_753(USBDEVFS_BULK);
build.rs
...
#[derive(Debug)]
pub struct Fix753 {}
impl bindgen::callbacks::ParseCallbacks for Fix753 {
    fn item_name(&self, original_item_name: &str) -> Option<String> {
        Some(original_item_name.trim_start_matches("Fix753_").to_owned())
    }
}

fn main() {
    let bindings = bindgen::Builder::default()
        .header("wrapper.h")
        .parse_callbacks(Box::new(Fix753 {}))
        .generate()
        .expect("Unable to generate bindings");
    ...
}
output bindings.rs
...
pub const USBDEVFS_CONTROL: ::std::os::raw::c_ulong = 3222820096;
pub const USBDEVFS_BULK: ::std::os::raw::c_ulong = 3222820098;

Seems like this workaround shouldn't conflict with future implementation.

vext01 commented 4 years ago

Not sure if this is the same problem, but I'm having issues getting at RTLD_DEFAULT from dlfcn.h.

It's defined thusly:

# define RTLD_DEFAULT   ((void *) 0)

Without hacks, it's totally absent from bindings.rs. Not sure why.

I've tried a few workarounds, including @iDawer's

wrapper.h:

#define _GNU_SOURCE                                                                                
#define __USE_GNU                                                                                  
#include <dlfcn.h>                                                                                 
#include <link.h>                                                                                  

// Import using another name, doesn't work.                                      
//const void *MY_RTLD_DEFAULT = RTLD_DEFAULT;                                                      

#define MARK_FIX_753(req_name) const void *Fix753_##req_name = req_name;                           
MARK_FIX_753(RTLD_DEFAULT);                                                                        

// Direct definition, works.                                                                       
//#define MY_RTLD_DEFAULT 0

build.rs:

extern crate bindgen;                                                                                

use std::{env, path::PathBuf};                                                                       

const BINDINGS_FILE: &'static str = "bindings.rs";                                                   
const WRAPPER_HEADER: &'static str = "wrapper.h";                                                    

#[derive(Debug)]                                                                                     
pub struct Fix753 {}                                                                                 
impl bindgen::callbacks::ParseCallbacks for Fix753 {                                                 
    fn item_name(&self, original_item_name: &str) -> Option<String> {                                
        Some(original_item_name.trim_start_matches("Fix753_").to_owned())                            
    }                                                                                                
}                                                                                                    

fn main() {                                                                                          
    // Rust target spec is needed for now so that auto-generated tests pass.                         
    // https://github.com/rust-lang-nursery/rust-bindgen/issues/1370#issuecomment-426597356          
    let bindings = bindgen::Builder::default()                                                     
        .header(WRAPPER_HEADER)                                                                    
        //.rust_target(bindgen::RustTarget::Stable_1_26)                                           
        .parse_callbacks(Box::new(Fix753 {}))                                                      
        .generate()                                                                                
        .expect("bindgen failed");                                                                 

    let out_path = PathBuf::from(env::var("OUT_DIR").unwrap());                                    
    bindings                                                                                       
        .write_to_file(out_path.join(BINDINGS_FILE))                                               
        .expect("Couldn't write bindings!");                                                       
}

bindings.rs:

extern "C" {                                                                                            
    #[link_name = "\u{1}Fix753_RTLD_DEFAULT"]                                                           
    pub static mut RTLD_DEFAULT: *const ::std::os::raw::c_void;                                         
}

Notice the value is not expanded. It's referenced as an extern :\

Needless to say, trying to use RTLD_DEFAULT gives a linker error:

...
  = note: /usr/bin/ld: /home/vext01/research/yorick/phdrs/target/debug/deps/phdrs-51a0fdcb223a7a9a.1sf375tf7jcp7kb.rcgu.o: in function `phdrs::find_symbol':
          /home/vext01/research/yorick/phdrs/src/lib.rs:(.text._ZN5phdrs11find_symbol17h0db50af030d9c91cE+0x6c): undefined reference to `Fix753_RTLD_DEFAULT'                                                    
          collect2: error: ld returned 1 exit status

Questions:

This is bindgen-0.52.

diwic commented 3 years ago

I encountered both of these problems, which I believe to be two different ones, neither

#define SNDRV_PCM_STATE_OPEN ((snd_pcm_state_t) 0)

nor

#define SNDRV_CTL_IOCTL_ELEM_LIST   _IOWR('U', 0x10, struct snd_ctl_elem_list)

...shows up in the output at all. Errors from cexpr are swallowed by bindgen and results in no output, and that's what happening here. Anyhow, what I ended up with as a workaround was a regex to cover both these cases, as can be seen here.

The regexes are r"#define\s+(\w+)\s+\(\((\w+)\)\s*(\d+)\)" and r"#define\s+(\w+)\s+_IO([WR]*)\(([^,)]+),\s*([^,)]+),?\s*([^,)]*)\)" and the code handling these are good enough for me, but there might be special cases for other headers that it does not handle.

jbaublitz commented 2 years ago

@jethrogb @emilio I now have two projects that I'm a part of that depend on this so I'd willing to help. Where do you want me to start? Does the preprocessor need to be added first at the cexpr level?

eliaxelang007 commented 1 year ago

Any updates to this issue?

I'm trying to create rust bindings for raylib, but bindgen isn't generating rust bindings for the color macros:

/* raylib.h */

/* ... */

/* These aren't being generated in the bindgen output. */
#define WHITE      CLITERAL(Color){ 255, 255, 255, 255 }   // White
#define BLACK      CLITERAL(Color){ 0, 0, 0, 255 }         // Black
#define BLANK      CLITERAL(Color){ 0, 0, 0, 0 }           // Blank (Transparent)
#define MAGENTA    CLITERAL(Color){ 255, 0, 255, 255 }     // Magenta
#define RAYWHITE   CLITERAL(Color){ 245, 245, 245, 255 }   // My own White (raylib logo)

/* ... */
pustekuchen91 commented 1 year ago

Is there a possible solution in the near feature?

I'm trying to convert macros for pin definitions of the harmony framework

#define MyGPIO_Get()               (((PORT_REGS->GROUP[2].PORT_IN >> 0U)) & 0x01U)
#define MyGPIO_PIN                  PORT_PIN_PC00
jbaublitz commented 1 year ago

@emilio I'd like to propose a temporary and longer term solution that will take a while due to it requiring LLVM.

In the short term, @ojeda found a workaround with clang where we should be able to use temporary intermediate files with the macros surrounded by parentheses to evaluate these sorts of expressions to a constant. This may slow down compile time, but it should have the benefit of allowing this particular case to be handled using current clang.

In the longer term, @ojeda talked with the clang maintainers and they are open to having someone add functionality to LLVM and clang to essentially keep C preprocessor macro information available for a given parsed file. As you probably already know, currently, a lot of the cpp information is dropped by the time you've parsed the file. The clang maintainer is open to changes that would effectively allow using a header file as a context from which you could evaluate arbitrary macro expressions as I understand it (correct me if I'm wrong, @ojeda, since I wasn't part of this conversation). This would provide bindgen with the ability to have much more robust macro evaluate through clang.

For now, would you be willing to accept the shorter term solution if I put up a PR and would you be interested in something like the longer term solution later on?

pvdrz commented 1 year ago

@jbaublitz could you expand a bit on how this workaround with intermediate files would work?

jbaublitz commented 1 year ago

@pvdrz Sure! Just as a heads up, I think @ojeda and I have come up with two similar but slightly different solutions so we're emailing back and forth to get on the same page. However, I've had success in our test case around macros in cryptsetup that use function-like macros in their expansion to define constants using my solution.

I've looked at the bindgen code and I feel like we kind of have two routes here. Currently, it looks like we have cexpr doing a lot of the heavy lifting in the case of parsing #defines. I'll outline the two possibilities I see with my solution and the tradeoffs and hopefully one of them will be acceptable for bindgen while we work on the longer term work of LLVM API changes.

The first option is a more localized change and could possibly be used as a backup for cexpr if it fails even, but involves more temporary files and therefore potential impacts to compile time for a large number of function-like macro definitions. In this case, if cexpr fails to parse the expansion of the macro (for example if it uses function-like macros in the expansion), it is possible to write out a temporary file containing a use of the macro. Clang can read in this temporary file and parse the macro invocation and provide an integer literal as the output in cases like the one I liked above. The drawback is that there would like need to be a temporary file for each failure to parse the macro expansion. This may not be acceptable but I wanted to put it out there as an option.

The other option would be a larger change but would only require one temporary file per header being passed to bindgen. I've had success with the clang crate generating a list of macro definitions in the header file, putting them all in a temporary file, and then evaluating each one, mapping the evaluation to the original macro definition name, returning all of that information in a HashMap. I'm not entirely sure how this would work with cexpr, but it is definitely the more performant option, and I have example code using the clang crate that demonstrates that it would work.

Do either of these sound acceptable or at least like something we can iterate on to find a solution that would be acceptable for bindgen? I understand that both of them are definitely workarounds and not the ideal solution we hope to get to with LLVM changes, but I think it could provide some quality of life improvements for right now. For example, when we built our Rust bindings for libcryptsetup against a new version that used function-like macros in their #defines after previously not doing so, we had a build failure of our high level bindings because those constants suddenly disappeared from libcryptsetup-rs-sys. Either of these solutions would at least prevent that from happening in the future.

ojeda commented 1 year ago

In the short term, @ojeda found a workaround with clang where we should be able to use temporary intermediate files with the macros surrounded by parentheses to evaluate these sorts of expressions to a constant. This may slow down compile time, but it should have the benefit of allowing this particular case to be handled using current clang.

Yeah, to give an example:

void f(void) {
    (M);
}

Then one can query with a cursor placed at (, and Clang will resolve the expression. It seems to always give Kind: Int though, at least through c-index-test -- we should test corner cases up to 64-bit and see if it works for our purposes.

Another way is initializing a variable:

int m = M;

If one requests m, that seems to return the value after type conversion.

The clang maintainer is open to changes that would effectively allow using a header file as a context from which you could evaluate arbitrary macro expressions as I understand it (correct me if I'm wrong, @ojeda, since I wasn't part of this conversation).

@AaronBallman told me that the C indexing APIs (include/clang-c/) work by building an AST, but that AST does not keep the macro information. If we implemented the change, then Aaron offered to review the changes and so on.

The change would involve keeping enough information to be able to evaluate the macros (or perhaps directly saving a map of evaluations, similar to what you suggest later, but on Clang's side).

(Thanks a lot @jbaublitz for looking into this)

AaronBallman commented 1 year ago

FWIW, Clang's C indexing APIs have a TU-level flag named CXTranslationUnit_DetailedPreprocessingRecord. Passing that causes the indexing APIs to behave as if -Xclang -detailed-preprocessing-record was passed to the compiler which might get you what you need already. If it doesn't get you what you need, then it's quite likely in the correct area for where you'd make changes to expose the macro information you're after. (I'm not intimately familiar with this part of the code, but you may need to call clang_annotateTokens to get cursors that visit macros.)

ojeda commented 1 year ago

Yeah, in my case I just tested via c-index-test, which from a quick look it seems to set CXTranslationUnit_DetailedPreprocessingRecord for -evaluate-cursor-at (and no reparsing). But if it happens to e.g. work via the API without changes to libclang, that would be great. Thanks a lot @AaronBallman for taking a look!

jbaublitz commented 1 year ago

@pvdrz Just a few more considerations:

jbaublitz commented 10 months ago

@pvdrz Would you prefer to just have the support in LLVM as opposed to the workaround I proposed?