haskell / c2hs

c2hs is a pre-processor for Haskell FFI bindings to C libraries
http://hackage.haskell.org/package/c2hs
Other
198 stars 50 forks source link

Marshalling nested structs #146

Closed ejenk closed 8 years ago

ejenk commented 9 years ago

I'm having trouble getting c2hs to generate the right code for marshalling a struct containing members whose type is another struct. Consider the following C header file, in which a variable of type bar_t contains two members of type foo_t.

typedef struct foo {
  int x;
  int y;
  int z;
} foo_t;

typedef struct bar {
  foo_t a;
  foo_t b;
} bar_t;

I would like to create Storable instances for analogous Haskell types Foo and Bar, where the instance for Bar is defined using the instance for Foo. For example, Bar's peek should be given as

peek p =
  Bar <$> (peek =<< ((\ptr -> do {return $ ptr `plusPtr` 0 :: IO (Ptr Foo)}) p))
      <*> (peek =<< ((\ptr -> do {return $ ptr `plusPtr` 12 :: IO (Ptr Foo)}) p))

However, there doesn't appear to be any way for c2hs to generate a pointer to a struct member; the get directive can dereference, but it can't reference. For instance, the c2hs code

 peek p =
   Bar <$> (peek =<< ({#get bar_t.a#} p))
       <*> (peek =<< ({#get bar_t.b#} p))

generates the incorrect

peek p =
  Bar <$> (peek =<< ((\ptr -> do {peekByteOff ptr 0 :: IO (Ptr ())}) p))
      <*> (peek =<< ((\ptr -> do {peekByteOff ptr 12 :: IO (Ptr ())}) p))

I suppose what I'd like to do is write something like

 peek p =
   Bar <$> (peek =<< ({#get &bar_t.a#} p))
       <*> (peek =<< ({#get &bar_t.b#} p))

but such syntax is not supported. Is there any way to do what I'm trying to do?

EDIT:

I believe I've found a more serious problem: c2hs calculates sizes and offsets incorrectly in the presence of nested structs. For example, consider the following structures.

typedef struct foo {
  long x[11];
  long y[11];
  int z[11];
} foo_t;

typedef struct bar {
  foo_t a;
  foo_t b;
  int c;
} bar_t;

Processing {#get bar_t.c#} yields code accessing an offset of 444, and {#sizeof bar_t#} yields 448. However, running offsetof(bar_t, c) yields 448, and sizeof(bar_t) yields 456.

If it matters, I'm running gcc 4.8.4 on x86-64.

RyanGlScott commented 9 years ago

I think you're running into a couple of issues here. The first issue is that the Haskell FFI doesn't support pass-by-value use of structs, so trying to directly peek the contents of a bar_t value won't work the way you'd want.

With this in mind, you should ask yourself: what do you want to do with a bar_t? If you really need to be able to peek and poke the foo_t values contained within, you'll need to do it another way involving pointers. In the past, I've often made wrapper types in C for this purpose:

typedef struct {
    foo_t *a;
    foo_t *b;
} wr_bar_t;

wr_bar_t *to_wr_bar(wr_bar_t *dst, const bar_t *const src) {
    ...
}

bar_t *from_wr_bar(bar_t *dst, const wr_bar_t *const src) {
    ...
}

You'd then deal with wr_bar_t when marshalling to/from Haskell, and deal with bar_t when you need to use a C API that requires it.

The second issue is that by default, {#get#} doesn't know what type to marshal a C struct as, so unless you tell it otherwise, it defaults to Ptr (). You can change this behavior using {#pointer#}:

data Bar = Bar FooPtr FooPtr

{#pointer *foo_t as FooPtr -> Foo#}

With this in place, the following code:

 peek p =
   Bar <$> {#get wr_bar_t.a#} p
       <*> {#get wr_bar_t.b#} p

should be replaced with something like this:

type FooPtr = Ptr Foo

 peek p =
   Bar <$> (\ptr -> do {peekByteOff ptr 0 ::IO (FooPtr)}) p
       <*> (\ptr -> do {peekByteOff ptr 8 ::IO (FooPtr)}) p
ejenk commented 9 years ago

Perhaps I wasn't clear that this was meant to be a feature request rather than a bug report (although it's morphed into one; see the EDIT). The code I wrote in the second codeblock works perfectly well for accessing the foo_t members by reference; there's no need to write any additional C code. The problem is that c2hs does not appear to be able to produce such code automatically; I have to run it first with the incorrect {#get bar_t.a#} and {#get bar_t.b#} to get the offsets and then manually change the code to access the offset pointers.

ian-ross commented 9 years ago

@ejenk The "morphed into a bug report" part, I don't think is fixable without a more or less complete rewrite of C2HS's struct handling code -- see issue #129. Nested structs just aren't something that the Haskell FFI deals with at all. I do know how to do this, but it would require a fundamental change in the way that C2HS works, and a major reworking of quite a lot of code. The basic problem is that C2HS tries quite hard mostly to get information about the memory layout of C data values using only the capabilities exposed by the Haskell FFI. Things that the Haskell FFI doesn't know about, you can deal with by generating little C programs and using them to query the C compiler about its layout choices, but that's not something that's generally been done in C2HS up to now.

The other part of what you're asking for, I'm not sure about and need to think about it a little. Of course, because of the other problem, getting it to work the way you ask might not end up being very useful!

Anyway, nested structs are a grubby and neglected corner of the Haskell FFI. Sorry that it's causing you trouble!

ian-ross commented 8 years ago

@ejenk I've been thinking about this, and I think I'm going to have to say that I won't fix it. (This is the first time I've ever done this for a C2HS issue, and I'm not that happy about it.) What you're asking for just isn't within the gamut of the capabilities provided by the Haskell FFI, as far as I can tell. Storage layout for nested C structs is something that is very compiler- and platform-dependent, and it's simply not something that you can disentangle just using calls to the Haskell FFI.

C2HS determines structure offsets for get and set hooks by calling functions in the Haskell FFI libraries to determine the sizes and alignments of the types of structure members, but it has no way to determine reliably when the C compiler inserts additional padding beyond the alignment requirements of individual structure members (which some compilers do when you have nested structs). That means that there's simply no purpose to implementing the proposed method for accessing fields in nested structs, because C2HS will almost certainly generate the wrong offsets in many cases, rendering the functionality useless.

As I said in the comment above, I do know how to do this reliably, but it would require a complete change in the way that C2HS works. However, if you can come up with a way of doing this within the constraints of the Haskell FFI, I'd be very interested in hearing about it and maybe we could work something out. From what I know after having looked into this a bit though, I think you're out ot luck.