Closed cgwalters closed 5 years ago
I believe the main issue is that &str
is a string slice. Meaning it's not null-terminated and is a 'fat pointer' (a tuple of pointer and length). This is not equivalent to const char*
as that's just a single pointer to a (generally null terminated) array. Additionally it's undefined behavior for a &str
to contain invalid UTF-8.
This means that client code should either be converting to &str
using a safe method, or explicit unsafe code.
This is not equivalent to const char* as that's just a single pointer to a (generally null terminated) array.
Right. To support this bindgen would probably have to grow support for injecting e.g. static inline
wrapper functions into the C header that performed conversions.
It can't inject them to C because conversion has to happen on Rust side (C doesn't and must not know layout of slice). Also such conversion on the way out is expensive and requires heap allocation + copying data, so if one wants to convert from C null-terminated string, it's better not to hide that complexity and do it implicitly, but rather require developer to perform it explicitly with CString
in their code and proper corresponding type in function signature.
OK so I admit when I filed this issue I didn't think through fully how it would need to work. You are both raising valid concerns and points.
However: I still find my "FFI translation layer" to be a super dangerous minefield - mostly so far around strings. Which I guess is probably the biggest special case?
I may experiment with macros to handle this more nicely.
Feel free to close this, but...as I noted initially there are a lot of projects doing this and it seems to me that cbindgen is in a position to help, though it would require generating Rust code too as noted. (Or maybe rather than generating Rust, the cbindgen user has to supply C entry points and the Rust &str
is returned as a void*
in the static inline
or so and then passed to the real C entrypoint)
However: I still find my "FFI translation layer" to be a super dangerous minefield
I can totally relate to that and actually was playing with a bunch of traits to mostly automate this for an internal project. I think we can just open-source it, but ideally I'd also want us to have a proc-macro that would automate these wrappers and cheap conversions too.
That said, I believe this sort of task is outside of scope of cbindgen, since it's mostly agnostic to Rust built-in type representations, and there is more than one or two ways to expose them in FFI (data/size struct, size/data struct, null-terminated string, opaque pointer, ...).
OK so I admit when I filed this issue I didn't think through fully how it would need to work. You are both raising valid concerns and points.
However: I still find my "FFI translation layer" to be a super dangerous minefield - mostly so far around strings. Which I guess is probably the biggest special case?
I may experiment with macros to handle this more nicely.
Feel free to close this, but...as I noted initially there are a lot of projects doing this and it seems to me that cbindgen is in a position to help, though it would require generating Rust code too as noted. (Or maybe rather than generating Rust, the cbindgen user has to supply C entry points and the Rust
&str
is returned as avoid*
in thestatic inline
or so and then passed to the real C entrypoint)
I also agree that this is a tough area. I'd love to have this be easier but I agree with @RReverser that this is probably best solved by a different tool.
I may be missing something fundamental, but is there a reason it isn't supported[1] to use
&str
in e.g. parameters, rendering it asconst char *
on the C side? And as a bonus, with__attribute__((nonnull))
?See discussion in e.g. https://github.com/projectatomic/rpm-ostree/pull/1655#discussion_r230750771
In general...when dealing with FFI I find myself writing a "translation layer API". I looked at some of the reverse dependencies of this crate, and there's a lot of helpers for things like this.
For example:
Now, different projects may have different policies they want for handling things like "what if the C passes NULL or invalid UTF-8", or whether or not to assume valid UTF-8 from the start, etc.
But I think we could figure out how to make this configurable?
[1] By "not supported" I really mean "emits 'str' literally to C which is obviously nonsense"