ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
35.19k stars 2.56k forks source link

translate-c: support C bitfields #1499

Open ghost opened 6 years ago

ghost commented 6 years ago
struct Test {
    unsigned foobar : 1;
};
pub const struct_Test = @OpaqueType();
d18g commented 6 years ago

Hi, Is there a design for fixing this bug? I would like to help if possible.

andrewrk commented 6 years ago

Hi @d18g. This is tricky because although Zig has bit fields in the form of packed struct, they are more well-defined and thus do not map directly to C's bit fields.

In theory we should be able to support this by emitting code that checks the target and emits different packed structs depending on the target, to match what C does on that target.

So in order to solve this issue, one must do the following:

The other option, which I think would also be OK is to say that translate-c only emits code for the target specified, and so translate-c only would emit a packed struct that matches the target given to translate-c.

Both options are roughly the same amount of work.

Even just the work of creating documentation for what the C ABI for bit fields is for the various compilers would be 60%+ of the work of solving this issue.

I had this issue marked as a bug, but that was incorrect; this is a missing feature. translate-c downgrades structs to opaque types when it cannot understand all the fields.

d18g commented 6 years ago

Hi @andrewk, Thanks for the detailed answer. Just to be sure I am not expecting any struct to be convertible only for "packed" C structs, while this will narrows the problem the issues you laid out are still standing. My motive here is for protocols or HW registers where packed structs are must and bitfields might play a role.

So I suggest multi-step plan (only for packed structs):

  1. Check that we can find out if a struct is packed, alternatively it can be a flag in translate-c which is ugly IMHO.
  2. Set the desired test cases.
  3. Select several C(/C++?) compilers for the initial research.
  4. Document the bitfield ABI behavior in all supported platforms for the selected compilers.
  5. Analyze the findings.

Next steps might be:

If this is acceptable then we can start addressing steps 1-3. After that a deep dive is needed in steps 4-5.

andrewrk commented 6 years ago

I think your plan makes sense. Step 1 is already solved:

https://github.com/ziglang/zig/blob/d40c4e7c896c5dfed9dd35e8c0d2117f7cf5c53c/src/translate_c.cpp#L4009-L4014

d18g commented 6 years ago

This is bitfield detection which is great. I meant packed struct as in #pragma pack or __attribute__((packed)).

andrewrk commented 6 years ago

Ah I see. Clang provides that information as well. It can be tricky to find out where in the API, but it's there. Using a packed struct in Zig we can always emulate whatever struct layout we need to.

Related: #1512

tadeokondrak commented 4 years ago

Clang's logic for this seems to be in MicrosoftRecordLayoutBuilder::layoutBitField and ItaniumRecordLayoutBuilder::LayoutBitField in lib/AST/RecordLayoutBuilder.cpp, for reference.

(Also, it actually exposes this information as part of the AST, so there should be no need to reimplement it.)

jvns commented 3 years ago

I ran into this problem today so out of curiosity I looked into how bindgen does this. It looks like their implemention is in the bitfields_to_allocation_units function.

Their implementation seems to be pretty simple -- as far as I can tell the only 2 cases it deals with are packed vs unpacked structs and named vs unnamed bitfields. Right now they seem to just be ignoring the Microsoft case (there's an if is_ms_struct but the bool is_ms_struct is hardcoded to false).

mahkoh commented 3 years ago

A few weeks ago I created library in Rust that computes the layouts of C types. I've created this library for two purposes:

My library is heavily tested against the layouts produced by the C compilers by generating C code, compiling it with the C compilers, and then extracting the type layouts from the debug information. I'm therefore confident that it has a high degree of accuracy. In the process I was also able to find various bugs in Clang's MicrosoftRecordLayoutBuilder implementation.

The code is heavily documented and it should not be hard to adapt it for use in your compiler. You can find it here: https://github.com/mahkoh/repr-c/tree/master/repc/impl/src/builder

mahkoh commented 3 years ago

I ran into this problem today so out of curiosity I looked into how bindgen does this. It looks like their implemention is in the bitfields_to_allocation_units function.

This implementation is incorrect.

aka-mj commented 1 year ago

It appears the current plan for getting support for C bitfields is via a switch in the translate-c backend from clang to arocc, is this correct? I was looking to see what it'd take to write a Linux kernel module in zig but without this I may hold off.

mgord9518 commented 1 year ago

Are there any hacks to get around this current issue? This prevents from being able to use libfuse with Musl libc due to its timespec implementation

Update for anyone else having issues with this: not sure why I hadn't thought of this before, but simply not importing the problematic header is a possible solution. You'll just have to carefully re-implement the type/function definitions in Zig. Not a perfect solution and may be tedious, but at least it can get you something working. For large header files, you should be able to automatically convert most of it with translate-c

Radvendii commented 6 months ago

but simply not importing the problematic header is a possible solution.

You can import the problematic header and redefine the problematic functions, right?

You'll just have to carefully re-implement the type/function definitions in Zig.

Would this not involve the same work that people are talking about putting into the compiler?

mgord9518 commented 6 months ago

@Radvendii

You can import the problematic header and redefine the problematic functions, right?

I don't see why not, but my issue was small enough that I felt re-defining everything I used wasn't a big deal

Would this not involve the same work that people are talking about putting into the compiler?

No, because the compiler issue would require a lot of testing to ensure it doesn't cause issues in edge cases. Parsing something manually is much, much easier than getting it to work reliably in code.

mlugg commented 3 months ago

https://jkz.wtf/bit-field-packing-in-gcc-and-clang seems like a good reference for the behavior on GCC and Clang, although I've not verified the behavior yet. (Also, bear in mind I think the author has written their binary strings backwards, with the LSB on the left; either that or their rule is incorrectly written.)