Open edsko opened 6 days ago
We are not really implementing (1). We hard code offsets and alignment for structs for example. For (1) one would really need to generate hsc2hs
code or "formulas" for size and alignment.
I don't think that (2) even makes sense for TH generation, TH is run on the target architecture anyway, so doing something else feels unnecessary. TH could assume host = target
, but I guess true (1) would be better at least from testing perspective.
- We could query @libclang@ to what choice it makes for the selected -- target platform, and use 'CShort' or 'CLong' (or something else again.
This doesn't feel right. I think that if we do (2) and libclang
says "unsigned 4 byte integer" then we should use Word32
and not try to figure out which C-type on the host is the same size as the target type. In other words, be as explicit as possible.
For (1) we also need https://github.com/well-typed/hs-bindgen/issues/134, if the header has struct foo { uint64_t bar };
, the field is uint64_t
on all machines, though libclang will tell us some "primitive" c-type (at least after we look through typedefs). Similarly, for something like uint_fast32_t
which is actually quite tricky to represent otherwise than as uint_fast32_t
. (Foreign.C.Types
doesn't have analogues for these, so those are a challenge for FFI already because of that).
We are not really implementing (1). We hard code offsets and alignment for structs for example. For (1) one would really need to generate hsc2hs code or "formulas" for size and alignment.
Yes, that's why I emphasize "API" (as opposed to implementation). Perhaps I should be clearer about that.
My main thinking why this is OK is that if we say, CInt
, then this type itself provides the same amount of ambiguity: you don't know what its size is. Therefore picking a specific implementation (in terms of Storable
instances, for example) is compatible with that: it's "implementation defined" after all. This way we keep the types the same, but the implementation differs. I think this is what most users would anyway expect?
I don't think that (2) even makes sense for TH generation, TH is run on the target architecture anyway
"TH is run on the target achitecture anyway" -- that would be nice, but not actually the case with current ghc is it?
This doesn't feel right. I think that if we do (2) and libclang says "unsigned 4 byte integer" then we should use Word32 and not try to figure out which C-type on the host is the same size as the target type. In other words, be as explicit as possible.
Yes, that's fair enough, if we do do 2, then indeed, we should be explicit.
For (1) we also need https://github.com/well-typed/hs-bindgen/issues/134, (..)
Yes, we should definitely not always look through typedef
s; I'm currently working on that.
think this is what most users would anyway expect?
I don't understand, so are you saing that
data StructFoo = MkStructFoo
{ field1 :: CInt
, field2 :: CInt -- ^ system independent types
}
instance Storable StructFoo where
sizeOf _ = 64 -- 32 bit system specific value
...
is fine?
IMO it isn't. Either it's
(1)
data StructFoo = MkStructFoo
{ field1 :: CInt
, field2 :: CInt
}
instance Storable StructFoo where
sizeOf _ = #{sizeof struct foo} -- or some formula, like sizeof_ @CInt + sizeof_ @CInt -- but also taking alignment into account.
or
(2)
data StructFoo = MkStructFoo
{ field1 :: Word32
, field2 :: Word32
}
instance Storable StructFoo where
sizeOf _ = 64
- that would be nice, but not actually the case with current ghc is it?
It is. TH is always run on the target, in a way or another (even GHCJS etc). The current multi-staged setup won't work otherwise. (TL;DR the Int
is the same on all stages, there are no host Int
and target Int
) (EDIT: Maybe there are some cross-compilation workarounds in use where people run TH code on host, pretending it's on target - but that can easily cause problems. I can share examples privately)
Why isn't it fine? CInt
is defined to be implementation defined; here, there is one such implementation. I don't see a conflict here.
Option (1), generating esentially .hsc
code, is explicitly not what the client wants: hs-bindgen
should do the resolution, and it should not depend on invoking a C compiler. We could maybe offer this as choice, but it would strictly be an enhancement that we choose to implement.
I don't understand re TH, but yes, we don't need to discuss this in this ticket.
The
data StructFoo = MkStructFoo
{ field1 :: CInt
, field2 :: CInt -- ^ system independent types
}
instance Storable StructFoo where
sizeOf _ = 64 -- 32 bit system specific value
...
is still system specific. Then there is really no difference between (1) and (2) if (1) means doing the above.
I couldn't do
hs-bindgen alib.h > ALib.hs
and commit that file to repository (i.e. run hs-bindgen
before code distribution). ALib.hs
will be system specific.
I think I understand, the interface of the module will appear to be system-independent in (1), but the implementation will be always specific to the target, whether it's (1) or (2).
In other words hs-bindgen
will not generate system independent code, it's out of scope?
Yeah, I take your point. I think this needs some discussion with the client.
So I guess we have three modes:
Storable
instances) is specific to a specific platform (we know the size of CInt
), but the API is system independent (we use CInt
rather than Word64
).
Word64
instead of CInt
).
.hsc
instead of .hs
files). The mode that I had somewhat implicitly assumed we'd focus on first is (1), but indeed all three are valid. I thought from previous discussions with the client that (3) was not considered too desirable, but perhaps we should revisit this question. One difficulty with (3) is CPP: If the header uses CPP to make machine dependent choices, then it becomes unclear what we should do; in a way, option (3) implies #72, at least to some degree.
Explained in this comment of
PrimType
:We are defaulting to option (1) currently, but as the comment says, we may wish to give users a choice.