zyantific / zydis

Fast and lightweight x86/x86-64 disassembler and code generation library
https://zydis.re
MIT License
3.44k stars 438 forks source link

Discussion: Feature `IFDEF`s and unions/bitfields in public API structs #280

Open flobernd opened 2 years ago

flobernd commented 2 years ago

Our internal structs are using IFDEFs to enable/disable feature specific fields on deman (e.g. AVX-512 related fiels). As well, bitfields are used to optimize struct sizes.

While bindings prevents us to do the same optimizations for the public API structs, we could introduce a new ZYDIS_SIZE_OPTIMIZED_UNSAFE_FOR_BINDINGS CMake option (naming open for discussion as well).

This option would indicate that Zydis is used in a pure C/C++ project and always built on demand (-> we know the feature defines of headers and code files are the same; this is the essential part!). This would allow us to e.g. change the ZydisDecodedOperand struct to union or to introduce bitfields in the public API.

athre0z commented 2 years ago

Hmmm, these compile-time settings are kind of hard to get to work when Zydis is installed from e.g. OS repositories. When building the binaries for Zydis in e.g. Debian or vcpkg repos, we'd have to make a decision. Since we'd also want bindings to work with these binaries, we'd have to disable these optimizations. Since vcpkg seems to become increasingly popular amongst our users, only a fraction of users would thus be able to really make use of this. I'm not opposed to having it as optional switch, but we can't rely on this to shrink our struct in the general case.

With regards to unions, I think we should just use them and make it the binding's problem on how to deal with that. In retrospective, I think restricting our API to the smallest common denominator of all languages that we want bindings to was a mistake. If most languages can support a feature, then we should use it.

Bitfields are still quite problematic, because you can't really support them in bindings even if you're willing to do it all manually, simply because C doesn't provide enough ABI guarantees to ever get that to work. This would thus be one of the candidates for being behind ZYDIS_SIZE_OPTIMIZED_UNSAFE_FOR_BINDINGS (we'd really need a better name of this though :p).

flobernd commented 2 years ago

That's fine! Was always meant to be an optional switch (default off).

Can we make a list of languages that support unions? Just a few I know they support them:

bjorn3 commented 2 years ago

Rust has native support for unions. (https://doc.rust-lang.org/reference/items/unions.html)

athre0z commented 2 years ago

My idea for Rust was to reorder our structs slightly so that the tag (e.g. type field for operands) immediately precedes the union, allowing us to use the really_tagged_unions feature to model them as safe enums. This is more convenient than using Rust's language feature for unions because they require unsafe on every use, so we'd have to wrap them with safe getters. While that wouldn't be a big deal either (and might be required for the union in the raw portion of the instruction struct, since it doesn't really have a dedicated tag), it's nice to have proper enums that you can match on for the operands.

For Python bindings, we use Cython and thus support this with no additional effort (since it's essentially like writing C in a weird Python-esque syntax).

bjorn3 commented 2 years ago

My idea for Rust was to reorder our structs slightly so that the tag (e.g. type field for operands) immediately precedes the union, allowing us to use the really_tagged_unions feature to model them as safe enums.

I thought about that feature too, but couldn't find if it had already been implemented yet or not. Just got confirmation that it was already effectively implemented when the rfc got merged.