rust-lang / rust-bindgen

Automatically generates Rust FFI bindings to C (and some C++) libraries.
https://rust-lang.github.io/rust-bindgen/
BSD 3-Clause "New" or "Revised" License
4.45k stars 695 forks source link

Option to use *mut objc::runtime::Object #1884

Open LoganDark opened 4 years ago

LoganDark commented 4 years ago

Basically, I have this header file that relies on some Objective-C types (NSString and stuff), but I don't want bindgen to spend 30 seconds generating hundreds of thousands of lines of useless bindings for them. I just want them all to be like type NSString = *mut objc::runtime::Object. Is it possible to do this?

simlay commented 4 years ago

Hmm. Not really. You could try to blacklist some things but it's unclear how well that works and is liable to break some of the generated bindings. As of #1847, there are return types based on the structured objective-c names. Also, the NSString is generated as a #[repr(transparent)] struct NSString(pub id) since #1722.

Depending on what you're doing, putting the generated bindings in a sub-crate or another workspace (usually suffixed with -sys), it will still do the initial annoying 30 seconds of generation but after the first build, it should cache it.

LoganDark commented 4 years ago

Yeah I soon found out that objects are not always pointers and sometimes they are actually objects inside of the struct, which is like... really annoying. Might have to treat them as opaque, then, instead of trying to replace them with pointers

Nevermind. Bindgen on objective-c files generates newtype structs containing pointers. Guess my first guess was correct.

emilio commented 4 years ago

Basically, I have this header file that relies on some Objective-C types (NSString and stuff), but I don't want bindgen to spend 30 seconds generating hundreds of thousands of lines of useless bindings for them. I just want them all to be like type NSString = *mut objc::runtime::Object. Is it possible to do this?

Where is this time spent? Can you use the whitelist to avoid generating most of the stuff you don't need?

LoganDark commented 4 years ago

Where is this time spent?

Probably parsing the entirety of Cocoa/Cocoa.h.

I timed it, this wrapper.m:

#include <Cocoa/Cocoa.h>

struct BindgenHatesOpaque {
    NSString* nsstring;
};

with NSString set as opaque and the whitelist only containing BindgenHatesOpaque:

    Finished dev [unoptimized + debuginfo] target(s) in 34.21s

The project only contained an empty lib.rs.

Additionally, this does not look opaque to me:

https://hastebin.com/ihuqusawoh.rs

This is okay:

pub type id = *mut objc::runtime::Object;

#[repr(transparent)]
#[derive(Clone, Copy)]
pub struct NSString(pub id);

#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct BindgenHatesOpaque {
    pub nsstring: NSString,
}

Since I asked for opaque, I don't need anything more. But bindgen decides to pull in all sorts of stuff, including stuff like NSCoder, NSData, NSZone etc...

Can you use the whitelist to avoid generating most of the stuff you don't need?

Not really, because some struct pulls in NSView which inherits from something else which pulls in ... ... which has a method that returns ... ... ... ... 400,000 lines later ...

As you can see above, specifying something as opaque has no effect and you could easily pull in hundreds of objects by accident.

Sure, turning off recursion would almost work. That would require a manual typedef. Speaking of typedef, however, and luckily, the header file I'm generating bindings for doesn't actually have any Objective-C syntax in it.

That brings me to my solution that doesn't even give bindgen any fluff to chew on...

That's right. Void pointer to the rescue.

  1. Typedef NSString to void*. We don't need Cocoa anymore, we don't even need the wrapper to be Objective-C anymore.

    image

  2. Include the header.

  3. Strip out your C typedef with the blacklist.

    image

  4. Then add a Rust one for the bindings.

    image

Ta-da, your C NSString * becomes Rust *mut NSString which then becomes *mut objc::runtime::Object! And all alignment is still satisfied because a pointer is still pointer sized no matter where it points (uhhhh... on macOS at least... I heard rumors about architectures where different kinds of pointers are different sizes o.O).

emilio commented 4 years ago

Probably parsing the entirety of Cocoa/Cocoa.h.

That's still terrible. We parse lots of C++ and never had such an issue where bindgen would take so long, there's likely a bad algorithm in the Objective-C specific code... Can you post the output of running bindgen with time_phases?

Ideally I'd take a perf profile of that but not sure how to do that with objective-c.

emilio commented 4 years ago

So I think I figured out how to profile this from Linux, but this code crashes bindgen for me:

@interface NSArray<__covariant ObjectType>

@property (readonly) unsigned count;
- (instancetype)initWithObjects:(const ObjectType [])objects count:(unsigned)cnt;
@end

(We don't properly resolve the generic). cc @simlay, not sure if this crashes on mac and if not why not.

emilio commented 4 years ago

Ok, so I worked around that for now...

For reference, the way I'm profiling it is:

./target/release/bindgen wrapper.mm -- --target=x86_64-apple-darwin -isysroot /home/emilio/.mozbuild/osx-cross/MacOSX-SDKs/MacOSX10.11.sdk

That takes about 6.933s on my machine. Which is not amazing. It seems like about 10% of the time is visiting clang cursors, so I guess there's a bunch of overhead from that which we may not be able to cut off. Then there's also rustfmt and the amount of items we generate.

With --no-rustfmt-bindings I take that down to 4.5s. With --whitelist-type BindgenHatesOpaque it is down to 1.64s.

Profiling only with --no-rustfmt-bindings I get this: https://share.firefox.dev/2YzWaTx

There's a 12% of the time spent in ObjCInterface::from_ty, a bunch of it just iterating over something. I suspect it's doing the protocol lookup introduced here: bd6fe14d6. That's not great, that's O(N^2) and something we should avoid.

There's a bunch of other stuff to optimize... compute_path shows up, compute_whitelisted_and_codegen_items spends a bunch of time doing lookups that for this particular configuration are moot (we are not whitelisting / blacklisting anything!). A bunch of the other phases could be parallel...

ItemSet lookups also show up, and maybe they can be cheaper than a BTreeSet. We could really have a bitfield of ctx.items.len() bits or something...

Anyhow, all in all, nice test-case :)

simlay commented 4 years ago

So I think I figured out how to profile this from Linux, but this code crashes bindgen for me:

@interface NSArray<__covariant ObjectType>

@property (readonly) unsigned count;
- (instancetype)initWithObjects:(const ObjectType [])objects count:(unsigned)cnt;
@end

(We don't properly resolve the generic). cc @simlay, not sure if this crashes on mac and if not why not.

What's weird about this ~case~ bug to me is that I've done this code (and yes it crashes on macOS) but this case has not shown up as a bug in any of the bindgen stuff I've test (including the cocoa bingings above). I like to think I've tested the objective-c bindgen work pretty good but I'd be lying if I said I was infallable.

As a side note, would there be some interest in some kind of bindgen "test suite" for objective-c that looks at a lot of the objective-c frameworks? I've got a repo that builds all the bindings for all the frameworks. Testing that repo against every PR on my personal system is frequently burdensome and it's not transparent on builds that a given thing compiles (or is actually runtime correct). Is there interest in adding something like this to CI?