ralfbiedert / interoptopus

The polyglot bindings generator for your library (C#, C, Python, …) 🐙
MIT License
321 stars 28 forks source link

C backend 'fallback_type' CChar to int8_t is problematic. #105

Open All8Up opened 4 months ago

All8Up commented 4 months ago

I ran into this today in regards to compiling on Arm processors. Unfortunately c_char types can't be simply coerced to int8_t due to platform architecture (and sometimes OS) differences. The issue being that for Arm they intentionally force C compilers to use uint8_t due to old legacy Arm chips not being able to load signed bytes pre-v4 architecture.

So the problem in a nutshell:

void main(... argv/c ...) { char* blargo = "foo.bar"; my_ffi_func(blargo); }

Compiles on Windows and Linux x86 platforms but if you go to a Linux Arm or an Apple platform it will fail because char is not int8_t but instead uint8_t*. This is a very loose area of the specification and you can even configure it on some compilers so it's problematic.

It looks like the solution would be to do a similar thing as Ascii pattern in the fallback function and then implement it in the writer to write out 'char' instead of int8_t. Seems reasonable I believe? I'm mostly concerned if there are side effects I'd be causing?

ralfbiedert commented 4 months ago

Thanks! I'd have to read up a bit more on the spec there, and I think adding a custom type could work. An alternative could be using a type alias, but I haven't thought that through.

All8Up commented 4 months ago

I'll give a quick hack try to it and see what happens, hopefully I can unblock the Arm support side of things with that and we can discuss the PR for better options.

The two more useful links involving this which I found:

Obviously the Rust docs themselves. Specifies that c_char can be either i8 or u8. https://doc.rust-lang.org/core/ffi/type.c_char.html

And a bit I found here regarding the variation between platforms. https://anssi-fr.github.io/rust-guide/07_ffi.html#platform-dependent-types

Unfortunately the C standard I found didn't have linkable sections but basically when you get to char, they explicitly call out that unlike 'int' which is always equivalent to 'signed int', char is three destinct types with no guarantee as to which refined variation it will represent. As with the case of x86 versus Arm and certain compilers, the option is left to the vendor for optimization reasons.

All8Up commented 4 months ago

Unfortunately, I started poking at this and failed to find where the transition from "c_char" to "int8" is being implemented. I kept tracking it backwards but then got hit with some priority work before I could finish the spellunking. If you perhaps remember where the transition is made and could point it out, that would greatly help now that I can get back to poking at this. I was getting pretty confused given that all the obvious places seemed to already have it converted to int8 and I just couldn't nail down any location where it was the original c_char.