mozilla / cbindgen

A project for generating C bindings from Rust code
Mozilla Public License 2.0
2.36k stars 304 forks source link

Support &str in parameters? #245

Closed cgwalters closed 5 years ago

cgwalters commented 5 years ago

I may be missing something fundamental, but is there a reason it isn't supported[1] to use &str in e.g. parameters, rendering it as const char * on the C side? And as a bonus, with __attribute__((nonnull))?

See discussion in e.g. https://github.com/projectatomic/rpm-ostree/pull/1655#discussion_r230750771

In general...when dealing with FFI I find myself writing a "translation layer API". I looked at some of the reverse dependencies of this crate, and there's a lot of helpers for things like this.

For example:

Now, different projects may have different policies they want for handling things like "what if the C passes NULL or invalid UTF-8", or whether or not to assume valid UTF-8 from the start, etc.

But I think we could figure out how to make this configurable?

[1] By "not supported" I really mean "emits 'str' literally to C which is obviously nonsense"

eqrion commented 5 years ago

I believe the main issue is that &str is a string slice. Meaning it's not null-terminated and is a 'fat pointer' (a tuple of pointer and length). This is not equivalent to const char* as that's just a single pointer to a (generally null terminated) array. Additionally it's undefined behavior for a &str to contain invalid UTF-8.

This means that client code should either be converting to &str using a safe method, or explicit unsafe code.

cgwalters commented 5 years ago

This is not equivalent to const char* as that's just a single pointer to a (generally null terminated) array.

Right. To support this bindgen would probably have to grow support for injecting e.g. static inline wrapper functions into the C header that performed conversions.

RReverser commented 5 years ago

It can't inject them to C because conversion has to happen on Rust side (C doesn't and must not know layout of slice). Also such conversion on the way out is expensive and requires heap allocation + copying data, so if one wants to convert from C null-terminated string, it's better not to hide that complexity and do it implicitly, but rather require developer to perform it explicitly with CString in their code and proper corresponding type in function signature.

cgwalters commented 5 years ago

OK so I admit when I filed this issue I didn't think through fully how it would need to work. You are both raising valid concerns and points.

However: I still find my "FFI translation layer" to be a super dangerous minefield - mostly so far around strings. Which I guess is probably the biggest special case?

I may experiment with macros to handle this more nicely.

Feel free to close this, but...as I noted initially there are a lot of projects doing this and it seems to me that cbindgen is in a position to help, though it would require generating Rust code too as noted. (Or maybe rather than generating Rust, the cbindgen user has to supply C entry points and the Rust &str is returned as a void* in the static inline or so and then passed to the real C entrypoint)

RReverser commented 5 years ago

However: I still find my "FFI translation layer" to be a super dangerous minefield

I can totally relate to that and actually was playing with a bunch of traits to mostly automate this for an internal project. I think we can just open-source it, but ideally I'd also want us to have a proc-macro that would automate these wrappers and cheap conversions too.

That said, I believe this sort of task is outside of scope of cbindgen, since it's mostly agnostic to Rust built-in type representations, and there is more than one or two ways to expose them in FFI (data/size struct, size/data struct, null-terminated string, opaque pointer, ...).

eqrion commented 5 years ago

OK so I admit when I filed this issue I didn't think through fully how it would need to work. You are both raising valid concerns and points.

However: I still find my "FFI translation layer" to be a super dangerous minefield - mostly so far around strings. Which I guess is probably the biggest special case?

I may experiment with macros to handle this more nicely.

Feel free to close this, but...as I noted initially there are a lot of projects doing this and it seems to me that cbindgen is in a position to help, though it would require generating Rust code too as noted. (Or maybe rather than generating Rust, the cbindgen user has to supply C entry points and the Rust &str is returned as a void* in the static inline or so and then passed to the real C entrypoint)

I also agree that this is a tough area. I'd love to have this be easier but I agree with @RReverser that this is probably best solved by a different tool.