haskell / c2hs

c2hs is a pre-processor for Haskell FFI bindings to C libraries
http://hackage.haskell.org/package/c2hs
Other
198 stars 50 forks source link

C2HS, broken on Win64 with Stack? #157

Closed deech closed 8 years ago

deech commented 8 years ago

Hi, I'm trying to run C2HS on Windows 7 64 bit using the C/C++ compilers that are now bundled with the latest GHC (7.10.3) and running into the following error:

c2hs.exe: C header contains errors:

C:/Users/owner/AppData/Local/Programs/stack/x86_64-windows/ghc-7.10.3/mingw/lib/gcc/x86_64-x64-mingw32/5.2.0/include /bmi2intrin.h:86: <column 21> [ERROR] >>> Syntax error !
  The symbol `__res` does not fit here.

At this point I'm not even sure this is the fault of c2hs or language-c but I thought I'd ask.

Thanks!

RyanGlScott commented 8 years ago

Here's a minimal example which reproduces the error (on C2HS 0.27.1):

-- C2HSExample.chs
module C2HSExample () where

#include <windows.h>

I suspect the error is specific to GHC 7.10.3, since it was that version in which the MinGW-w64 toolchain that is bundled with Windows GHC was upgraded. In particular, it now includes gcc-5.2.0.

RyanGlScott commented 8 years ago

Here is the code from bmi2intrin.h that C2HS fails to preprocess:

extern __inline unsigned long long
__attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mulx_u64 (unsigned long long __X, unsigned long long __Y,
       unsigned long long *__P)
{
  unsigned __int128 __res = (unsigned __int128) __X * __Y;
  *__P = (unsigned long long) (__res >> 64);
  return (unsigned long long) __res;
}
RyanGlScott commented 8 years ago

And __int128 is defined in ghc-7.10.3/mingw/x86_64-w64-mingw32/include/_mingw.h:

#ifndef _INT128_DEFINED
#define _INT128_DEFINED
#ifdef __GNUC__
#define __int8 char
#define __int16 short
#define __int32 int
#define __int64 long long
#ifdef _WIN64
#if (__clang_major__ > 3 || (__clang_major__ == 3 && __clang_minor__ >= 1)) && \
    !defined(__SIZEOF_INT128__) /* clang >= 3.1 has __int128 but no size macro */
#define __SIZEOF_INT128__ 16
#endif
#ifndef __SIZEOF_INT128__
typedef int __int128 __attribute__ ((__mode__ (TI)));
#endif
#endif
#endif /* __GNUC__ */
#endif /* _INT128_DEFINED */
deech commented 8 years ago

Wow! Thanks for tracking this down so quickly. I'll help test patches if necessary. Guess not too many people are using C2HS on Win64. :)

RyanGlScott commented 8 years ago

To be clear: I don't know why C2HS is choking on the above lines. I'll need @ian-ross's expertise to figure that out. All I can do is be a guinea pig for Windows testing :)

ian-ross commented 8 years ago

I'll try to look at this this weekend. I'd guess that it's a language-c issue, but I'm not sure. We experience periodic breakage when C compilers change...

ian-ross commented 8 years ago

I think that the problem here is that __int128 is defined as a typedef name, then it's used as unsigned __int128. Neither gcc 5.3.0 nor clang 3.7.1 accept that on Linux.

According to a quick skim through the C standard, this isn't even legal (look at Example 3 on page 138 of http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf).

I don't know what to suggest here. It's clearly not a C2HS problem, since it's just down to language-c refusing to parse the header files because of this. But dealing with this coherently in language-c is not a completely trivial fix: the language-c AST doesn't have any structure for representing this "typedef names with extra type specifiers", and it's not even clear to me what the semantics of this are supposed to be: are you supposed to pretend that referring to a typedef name in a declarator is equivalent to a textual substitution of the typedef, so you can tack whatever extra type specifiers you want in front? That's not how typedef names are supposed to work in C though...

In the past, I've done fixes for language-c myself to help with this sort of problem, but I don't even really know where to start with this one.

What are the compilers shipped with GHC 7.10.3 on Windows anyway?

ian-ross commented 8 years ago

OK, I see that @RyanGlScott says the gcc version is 5.2.0. What does your C compiler do with this?

typedef int tst_int;
unsigned tst_int a;

On Linux, gcc 5.3.0 chokes on it, saying:

tst.c:3:18: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘a’
 unsigned tst_int a;

clang 3.7.1 says this:

tst3.c:2:10: error: redefinition of 'tst_int' as different kind of symbol
unsigned tst_int a;
         ^
tst3.c:1:13: note: previous definition is here
typedef int tst_int;
            ^
tst3.c:2:17: error: expected ';' after top level declarator
unsigned tst_int a;
                ^
                ;
2 errors generated.

That first error message ("redefinition as different kind of symbol") seems to square with that example from the standard I quoted above.

In any case, it seems that things are unlikely to work whenever __int128 gets defined via a typedef. It's a builtin integral type in gcc, so without the typedef, saying unsigned __int128 isn't a problem. Any time you get into the #ifdef branch in _mingw.h that defines it as a typedef though, you're going to get into trouble when processing bmi2intrin.h.

RyanGlScott commented 8 years ago

Huh, compiling that code with gcc 5.2.0 on Windows gives the exact same gcc error.

This gcc does appear to have __int128 and __SIZEOF_INT128__ defined by default, even without any #includes. What's interesting is if I try to compile something without __SIZEOF_INT128__ defined:

#undef __SIZEOF_INT128__
#include <windows.h>

int main() {
  return 0;
}
$ gcc wat.c
In file included from C:/Users/RyanGlScott/Software/ghc-7.10.3/mingw/x86_64-w64-mingw32/include/windows.h:9:0,
                 from wat.c:2:
C:/Users/RyanGlScott/Software/ghc-7.10.3/mingw/x86_64-w64-mingw32/include/_mingw.h:241:13: error: two or more data types in declaration specifiers
 typedef int __int128 __attribute__ ((__mode__ (TI)));
             ^
RyanGlScott commented 8 years ago

Also, GHC for Windows uses variants of the MinGW-w64 gcc toolchain that have been compiled by the MSYS2 project. Here are the versions of things that GHC Windows uses currently:

binutils-2.25.1-1
crt-git-5.0.0.4531.49c7046-1
gcc-5.2.0-3
gcc-libs-5.2.0-3
gmp-6.0.0-3
headers-git-5.0.0.4531.49c7046-1
isl-0.14.1-2
libiconv-1.14-5
libwinpthread-git-5.0.0.4538.78dca70-1
mpc-1.0.3-2
mpfr-3.1.3.p0-2
winpthreads-git-5.0.0.4538.78dca70-1
zlib-1.2.8-8
ian-ross commented 8 years ago

@RyanGlScott Looks like a bit of a mess. Do you agree that it's not a C2HS issue though? Are you happy for me to mark it as "wontfix"? (I'll leave it open so you can add any more information you find, since there's not really anywhere else for it.)

deech commented 8 years ago

Hi, Thanks for looking into this! I opened the issue and I'm ok with closing as "wontfix".

Do you have any suggestions as to who might be able to help with this?

ian-ross commented 8 years ago

@deech Benedikt Huber has been looking after language-c, and he accepts patches, so if you were to figure out what to do, you could probably get language-c updated. But I don't know anyone who is a big enough Windows user of C2HS or language-c to care enough about this to disentangle everything that's going on. I don't have a Windows setup I can test on, and I suspect that this might need quite a lot of experimentation to figure out. Does it make C2HS totally unusable on Windows?

deech commented 8 years ago

Yes, C2HS with GHC 7.10.3 on Windows should always fail with this error.

GHC 7.10.2 seems to work fine. Hopefully GHC 8 works well too. Sure am glad 'stack' exists, because it means I'm going to have to pin Windows users to 7.10.2. :)

It might be worth issuing a statement on the mailing list or something but that's your call.

RyanGlScott commented 8 years ago

I doubt GHC 8 will work, since it uses the same MinGW-w64 gcc toolchain as GHC 7.10.3.

I think you're right in stating this isn't a c2hs issue, and something probably needs to be patched on the MinGW-w64 side. One thing doesn't make sense to me, though. This code:

extern __inline unsigned long long
__attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mulx_u64 (unsigned long long __X, unsigned long long __Y,
       unsigned long long *__P)
{
  unsigned __int128 __res = (unsigned __int128) __X * __Y;
  *__P = (unsigned long long) (__res >> 64);
  return (unsigned long long) __res;
}

compiles without issue without needing any #includes. If that's the case, it seems unlikely that __int128 is actually a typedef (at least, by default). Then why does language-c reject it? (Is there a way to check whether __int128 is in fact a typedef or not?)

deech commented 8 years ago

I just looked in the GNU GCC docs and they advocate using __int128 this way: https://gcc.gnu.org/onlinedocs/gcc/_005f_005fint128.html

ian-ross commented 8 years ago

@RyanGlScott __int128 is a builtin type in GCC, but you can make a typedef called __int128 as well. If you don't make the typedef, then saying unsigned __int128 is fine, since you're just saying you want the unsigned version of a builtin type. If you do make the typedef, then saying unsigned __int128 isn't allowed. The code from _mingw.h you show above defines __int128 as a typedef under the conditions that: _INT128_DEFINED is not defined, __GNUC__ is defined, _WIN64 is defined and the clang version condition is not true. I think it's the last part of this that's a problem because the __clang_major__ and __clang_minor__ constants aren't defined at all in GCC.

However, that's all kind of worthless, since I know nothing about how the compilers and MinGW setup on Windows are supposed to work!

RyanGlScott commented 8 years ago

I really don't think this has anything to do with typedefs, but rather how language-c is parsing __int128 in general. As evidence, compiling the definition of _mulx_u64 works just fine on both Windows and Linux, but on both OSes, language-c fails to parse it:

$ ghci
GHCi, version 7.10.3: http://www.haskell.org/ghc/  :? for help
λ> import Language.C
λ> parseCFilePre "wat.c"
Left wat.c:6: (column 21) [ERROR]  >>> Syntax Error !
  Syntax error !
  The symbol `__res' does not fit here.

@visq, how does language-c deal with __int128 (as a GCC extension) at the moment?

ian-ross commented 8 years ago

OK, this is kind of confusing (for me at least!). You're right that language-c doesn't deal with any of these extra builtin types directly. Just trying to load this from a file with Language.C.parseCFilePre gives an error, for example:

__int128 x;

So it looks like this isn't going to work whichever way it goes: typedef means no unsigned __int128; no typedef means no builtin GCC special __int128 understood by language-c. The __int128 type only turns up in a very few places in system headers on Linux, so this problem doesn't crop up there. Having it right there in _mingw.h is obviously a bit of a problem on Windows though.

The problem is really that if you want all of this kind of thing to work, language-c needs to track GCC releases and the language extensions that get added from time to time, and that means that someone has to do that work for language-c.

The weird thing is that it looks as though __int128 was added to GCC around version 4.6.0, which was some time ago, so I don't know why this is only cropping up now.

RyanGlScott commented 8 years ago

To be specific, the problem is that windows.h transitively depends on bmi2intrin.h, where __int128 is used (in _mingw.h, the typedef definition of __int128 is always CPP-ed out unless you're doing something strange, so I think trying to look at _mingw.h is a red herring). You can also trigger the issue on Linux by creating a file with #include <x86intrin.h> and attempting to parse it with language-c:

$ ghci
GHCi, version 7.10.3: http://www.haskell.org/ghc/  :? for help
λ> import Language.C
λ> import Language.C.System.GCC
λ> parseCFile (newGCC "gcc") Nothing [] "wat.c"
Left /usr/lib/gcc/x86_64-linux-gnu/5/include/bmi2intrin.h:86: (column 21) [ERROR]  >>> Syntax Error !
  Syntax error !
  The symbol `__res' does not fit here.

I suppose Linux users just don't have much of a need for this? :)

ian-ross commented 8 years ago

Right, so whatever other workarounds you might come up with, the root problem is that language-c doesn't know about __int128 (or any of the other __intXX GCC types). It's just never been a problem on Linux so far (as far as I know).

RyanGlScott commented 8 years ago

Well, language-c does appear to support some GCC-specific types such as _Bool (by inspecting the source code). FWIW, I just compiled a modified version of language-c where I hacked in __int128 and unsigned __int128 into the ASTs, and now language-c can parse #include <x86intrin.h> without issue.

Now, I have no idea if this is how @visq would want to handle this in GCC—I'd still need his input on that.

visq commented 8 years ago

From the discussion I understand this is a new basic type (like short or long). It is easy to add new builtins or typedefs, but there is no simple way to support new primitive types in language-c, you will need to modify the AST.

Because language-c aims to support all gcc extensions, and because c2hs needs it, I would agree that __int128 should be supported in language-c, even if this requires a modification of the AST. @RyanGlScott, if you have a patch ready, I'm happy to review it.

ian-ross commented 8 years ago

@visq Sounds good! @RyanGlScott Are there other types that need to be covered as well as __int128?

RyanGlScott commented 8 years ago

Chances are, there are probably more GCC-specific types in existence that we haven't found yet. But to avoid casting too wide of a net, I'll stick with __int128 and unsigned __int128, since adding those two types to language-c fixes the c2hs issue for me.

I e-mailed @visq a language-c patch which implements the needed changes to support __int128.

RyanGlScott commented 8 years ago

@visq has uploaded language-c-0.5.0.0 to Hackage, which contains the __int128 fix. At least on my end, I can successfully compile .chs files containing windows.h or x86intrin.h, so I believe that should be the end of that issue.

@deech, are you able to build fltkhs successfully on Windows with GHC 7.10.3 and c2hs built against language-c-0.5.0.0?

ian-ross commented 8 years ago

@deech @RyanGlScott Are either (or both!) of you interested in taking over maintenance of C2HS? (https://mail.haskell.org/pipermail/c2hs/2016-March/001134.html)

deech commented 8 years ago

@ian-ross I am tentatively on board with helping out with C2HS. Ideally I wouldn't want to be the only one maintaining because, as we have just seen, there will be problems I don't really understand how to solve.

@RyanGlScott Thanks for all your work. I don't have access to a Windows box, but I will tonight, so I'll build there and let you know.

ian-ross commented 8 years ago

@deech Excellent. I think that C2HS needs someone on maintenance who's a bit more invested in the thing. I only use C2HS for one package of my own, never work on anything other than Linux, and no longer have the enthusiasm required to do the C parser switchover stuff that's probably going to be needed. I went through a big C2HS hacking phase about a year ago and added quite a bit of new functionality, but I don't think I'd do that again soon, and so I don't think I'm the right person to be the maintainer. I'm sure we can twist @RyanGlScott's arm into helping as well!

jputcu commented 8 years ago

I've been able to build fltkhs, and run the hello-world example, on windows 7 using c2hs with language-c 0.5.0.

ian-ross commented 8 years ago

Should be fixed in 0.28.1.