Open edsko opened 3 months ago
The cross-compilation story here is more complicated, if we want to be able to generate bindings for target B but run the test suite on target A; probably we simply don't want to support that, at least not initially.
I am working with the simple_structs
example, and the fields offsets are not what I expect.
I expect S2
to have a size of 12 bytes with an alignment of 4 bytes, with offsets 0
, 4
, and 8
for the respective fields.
clang_Cursor_getOffsetOfField
is returning offsets of 0
, 32
, and 64
(generated code) even though the size and alignment of the structure are as expected.
hsc2hs
, the offsets are as I expect.Is this an issue, or am I doing something wrong?
The Rust bindgen
tutorial is out of date. It no longer generates tests, but it runs "build-time tests" instead. Doing the tutorial, the tests are in the generated bindings.rs
file and look like this:
const _: () = {
["Size of bz_stream"][::std::mem::size_of::<bz_stream>() - 80usize];
["Alignment of bz_stream"][::std::mem::align_of::<bz_stream>() - 8usize];
["Offset of field: bz_stream::next_in"][::std::mem::offset_of!(bz_stream, next_in) - 0usize];
...
};
The tests are put inside of const _: () = { ... };
blocks to ensure that they do not pollute the outer scope. There are separate blocks per struct
.
Each test looks like ["test description"][expression];
, where the test passes when the expression evaluates to zero. If an expressions does not evaluate to zero, it is an invalid index of the array and results in a build-time failure.
The following things are tested:
struct
sstruct
sstruct
fieldsBased on https://juliainterop.github.io/Clang.jl/stable/api/ getOffsetOfField returns offset in bits, not bytes (and that makes sense when you consider packed structs)
Based on https://juliainterop.github.io/Clang.jl/stable/api/ getOffsetOfField returns offset in bits, not bytes (and that makes sense when you consider packed structs)
I see. I wonder if we should make this more explicit in our Hs implementation (data Offset = Bytes .. | Bits ..
)? And then leave the Bits
case unsupported for now..?
Great find, @phadej! Thanks!
To confirm, I tested the following structure and indeed see the expected offsets of 0, 8, and 16 bits.
struct foo {
char a;
char b;
char c;
};
I tested the following bit-field structure and get -1
field offsets for all fields. The libclang
documentation says that this is returned when the cursor kind is not a field declaration, but clang-ast-dump
confirms that the cursor kind is CXCursor_FieldDecl
for all fields.
#include <stdbool.h>
#include <stdint.h>
struct datetime {
unsigned _BitInt(6) dt_seconds :6;
unsigned _BitInt(6) dt_minutes :6;
unsigned _BitInt(5) dt_hour :5;
unsigned _BitInt(5) dt_day :5;
unsigned _BitInt(4) dt_month :4;
bool dt_is_dst :1;
uint16_t dt_year;
};
This seems to be a C23 support issue. I tested the following C89 version and get the expected results.
struct datetime {
unsigned dt_seconds :6;
unsigned dt_minutes :6;
unsigned dt_hour :5;
unsigned dt_day :5;
unsigned dt_month :4;
unsigned dt_is_dst :1;
unsigned dt_year;
};
In addition to documenting the bits unit of clang_Cursor_getOffsetOfField
, I added clang_Cursor_isBitField
and clang_getFieldDeclBitWidth
so that I can confirm the behavior in clang-ast-dump
.
I imagine that we will need to handle the Storable
instances for records with bit fields specially. Perhaps we need to use peekByteOff
and pokeByteOff
, parsing/constructing the packed bytes manually... If so, we likely need to deal with the endianness of the platform.
I am not sure if we should add an Offset
type or keep the unit as bits like libclang
. This would be good to consider when implement support for bit-fields.
For now, I just did the following:
Added documentation to HsBindgen.C.AST.Type.StructField
Added assertions to HsBindgen.C.Fold.Type.mkStructField
unless (fieldOffset `mod` 8 == 0) $ error "bit-fields not supported yet"
Converted from bits to bytes in HsBindgen.Translation.LowLevel.structDecs
PR #275
I manually wrote tests to get a feel for the factors that we need to consider in the design.
I used QuickCheck for this experiment. It has few dependencies, and there are (already) Arbitrary
instances for C types. We need to be able to create Abitrary
instances for any user types that we would like to test. I wonder if QuickCheck sounds good, or if a different library (such as hedgehog) would be preferred.
I did not use pkgconfig
for this experiment. The .cabal
file explicitly specifies C headers and sources. My understanding is that using pkgconfig
is very beneficial when linking to external packages that may have dependencies on other packages, but it does not work on Windows. Using pkgconfig
for local C source requires exporting environment variables while building as well as during execution (of tests). Explicitly specifying the C header and sources like in this experiment works well for local C source. I have not tried mixing the two: using pkgconfig
to configure external dependencies and explicitly specifying C headers and sources of generated tests. This is something that is probably worth trying out.
Allocation and deallocation of memory in tests is all done on the Haskell side, using alloca
.
The C->Haskell test, given an arbitrary value, calls a C function with the values of the fields and a pointer to a structure value. The C function populates the structure value. The QuickCheck property is equality of the value populated in C with the original arbitrary value.
The Haskell->C test, given an arbitrary value, calls a C function with a pointer to the structure value and the expected values of the fields. The C function checks that each field is equal to the expected value.
I wish that I could output detailed error messages on failure, but QuickCheck does not support that AFAIK.
On the C side, we need equality for each field type for any type that we would like to test. This is trivial in this experiment because only primitive types are used, but it complicates things when supporting other types.
When we generate tests, perhaps the goal is to generate the whole test-hs-bindgen
tree on the filesystem? Perhaps we should first recursively delete the directory if it already exists? Users would be able to explicitly manage the content using revision control (one of the major benefits of the preprocessor backend). Perhaps we should avoid touching the user's .cabal
file? We could create a test-hs-bindgen/README.md
file with a bit of documentation, including a suggested test-suite
stanza.
Thoughts? Is this moving in the right direction? Please feel free to share any corrections and suggestions.
I added unit tests that check the size and alignment.
Test suite test-hs-bindgen: RUNNING...
test-hs-bindgen
Acme
CFoo
sizeof: OK
alignof: OK
C->Haskell: OK
+++ OK, passed 100 tests.
Haskell->C: OK
+++ OK, passed 100 tests.
All 4 tests passed (0.00s)
Test suite test-hs-bindgen: PASS
The alignment check uses stdalign.h
, added in C11. I wonder what minimum standard of C we should target in the generated code...
I created a hedgehog version of the manual tests.
I implemented some generators for the C types used in the test. The generator for CFloat
includes special IEEE values: negative zero, infinity, and negative infinity. It does not include NaN. The tests compare values to ensure that they match on the Haskell and C sides, but all NaN comparisons are false. We could use isnan
on the C side and isNaN
on the Haskell side to implement equality that would work for us. On the Haskell side, we probably do not want to do that in the Eq
instances for user types, but we could write separate equality functions for use in tests. For now, I simply do not generate NaN.
The QuickCheck instance does not include special IEEE values at all (which is somewhat disappointing), so Nan is not an issue there.
FWIW, I prefer hedgehog over QuickCheck.
I updated my hedgehog tests to allow NaN, in case we want to do that.
On the Haskell side, it uses a RepEq
(representational equality) type class. Floating values have special implementations, but instances for other primitives just use the default (==
). I wrote the CFoo
instance by hand for this experiment, but we can use (sop) generics to generate instances in such cases.
On the C side, two library functions are added for checking floats and doubles for representational equality. The generated test code just needs to use those functions where appropriate. We likely should create a rep_eq_
function for each user type (structure) instead of inlining it. Such functions compare values of structure fields with the expected values; they do not compare two structures. Even if we do not allow NaN, we should create such (eq_
) functions, so it allowing NaN does not add much code on the C side.
We will likely have library functions that are needed by the generated tests. We could create a package for the Haskell side, but the test generator needs to create the C headers and source files anyway, so perhaps it is best to do the same for Haskell modules.
The current experiment tests a single "generated" module, but one of our goals is to support generation of multiple modules. If we need to generate instances (such as Arbitrary
instances if we use QuickCheck and RepEq
instances if we want to test NaN), it may be easiest to put all orphan instances in a single module even if the modules are tested separately.
When we generate low-level data types with
Storable
instances, we should generate a test suite that verifies that does a "self-test": generate some C code that accepts a pointer to memory, fills it with some values, then on the Haskell side use theStorable
instance to verify that all values are in the right place, and construct an error message if not.Note that such a test failure would be a bug in
hs-bindgen
, so users would not be able to do much with it other than report it (or try to fix it), but at least we'd catch the problem.(Rust
bindgen
generates tests also; it might be useful to take a closer look at exactly they are testing, so see if we can/should do the same.)