haskell / text-icu

This package provides the Haskell Data.Text.ICU library, for performing complex manipulation of Unicode text.
BSD 2-Clause "Simplified" License
47 stars 41 forks source link

Segfault on Windows 64 bits #12

Closed ygale closed 2 years ago

ygale commented 9 years ago

Even a simple use of text-icu - this code basically just calls collate - causes a segfault on Windows with a 64-bit build of GHC.

I tested the procedure below on Windows 7 64 bits with GHC 7.6.2 32 bits, 7.8.3 32 bits, and 7.8.3 64 bits. It works with both 32 bit versions, and segfaults with the 64 bit version. In all cases I used cabal-install and Cabal compiled from current HEAD on the 1.20 branch, rev. caf257cd96e766b293943bbac07d766ec2f552dd.

Steps to reproduce:

Clone this repo: http://github.com/ygale/test-text-icu

Download ICU4C version 54.1 binary for Windows, 32 bits or 64 bits depending on which GHC version you are using. Extract the zip into a folder near the repo.

Inside the repo:

cabal sandbox delete
cabal sandbox init
cabal install --extra-lib-dirs=C:\absolute\path\to\icu\lib64 --extra-include-dirs=C:\absolute\path\to\icu\include text-icu
cabal install

Now, place the resulting executable .cabal-sandbox\bin\test-text-icu.exe together with the 3 DLLs icudt54.dll, icuin54.dll, and icuuc54.dll from the bin or bin64 subfolder of icu all together in the same folder. Run test-text-icu.

For 32-bit builds, this runs successfully and prints the text:

Q: Why did the multi-threaded chicken cross the road?
A: get other side. the to To

For 64-bit builds, this prints the first line, then segfaults.

This issue was posted to the ghc-devs list, in a thread about GHC linker changes initiated by @bos and cross-posted to the cabal-dev list.

ygale commented 9 years ago

For completeness, I ran the test suite. Needless to say, it didn't get very far with a library that immediately segfaults whatever you try.

Running 1 test suites...
Test suite tests: RUNNING...
Test suite tests: FAIL
Test suite logged to: dist\test\text-icu-0.7.0.0-tests.log
0 of 1 test suites (0 of 1 test cases) passed.

text-icu-test-segfault

ygale commented 9 years ago

I investigated further. Here is the status so far:

However, I don't know what tool is used to create the libpq DLLs, so it still could be a problem with MSVC-generated DLLs.

This compiles but fails to link, with the linker complaining that the symbol ucnv_countAvailable_54 is undefined. (Note that the ICU4C header files have CPP macros that rename all functions to append the ICU version number.)

bos commented 9 years ago

The linker error is probably due to a function that got removed when ICU 54 was released. The ICU team are charming about breaking backwards compatibility in this way :-p

RyanGlScott commented 8 years ago

Hm. I'm not sure if this makes a difference, but I successfully ran test-text-icu on Windows 8.1 x86_64 using GHC 7.10.3-rc1 (that might be critical, since it comes with a much more recent version of MinGW tools). I also used MSYS2. Here's what I did:

  1. Open MSYS2's MinGW-w64 shell.
  2. Install ICU-related libraries with pacman -S mingw-w64-x86_64-icu. (Apparently, this uses ICU 55.1.)
  3. Install text-icu with

    cabal install text-icu --extra-lib-dirs="C:/msys64/mingw64/lib" --extra-include-dirs="C:/msys64/mingw64/include"

    The extra library and include directories might live elsewhere on your system, but if you install pkg-config with pacman -S mingw64/mingw-w64-x86_64-pkg-config, you should find out where they live with pkg-config --libs icu-i18n and pkg-config --cflags icu-i18n.

  4. Build and run test-text-icu, get

    $ ./dist/build/test-text-icu/test-text-icu.exe
    Q: Why did the multi-threaded chicken cross the road?
    A: get other side. the to To

I also ran the text-icu test suite, and that succeeded. Can you try those steps on some of the GHC versions you're using? I'm curious if the upgraded MinGW tools are important (e.g., I couldn't use pcre-light on Windows until GHC 7.10.3-rc1).

ygale commented 8 years ago

Yes, installing MSYS2 and then installing ICU via pacman works for earlier versions of GHC, too. That is how we worked around the problem (as I reported at the time).

But a full MSYS2 environment is not usually a requirement for building Haskell libraries. Supposedly, GHC bundles enough of MingW that you can build Haskell libraries against vanilla DLLs such as the ones provided in the standard ICU4C download. Apparently that was (is?) not working for 64 bits due to a known bug in the version of MingW64 that 64-bit GHC used. So with a more recent GHC, the problem may have been resolved. I haven't tried that yet.

RyanGlScott commented 8 years ago

Ah, I wasn't aware you'd tried MSYS2 before; sorry if this is old news.

For completeness's sake, I also tried the standard MVSC-compiled ICU4C libraries (versions 55.1 and 56), and they seemed to work as well. All I had to do was rename icuuc{55,56}.dll, icudt{55,56}.dll, and icuin{55,56}.dll to icuuc.dll, icudt.dll, and icuin.dll, respectively.

ygale commented 8 years ago

Yes, sorry about that. I did write about it in detail at the time on the Café and reddit, but I should have updated this issue.

Why did you need to rename the DLL's - isn't it just a question of the names of the symbols defined inside? What happens if you don't rename? (Other than the obvious fact that you need to refer to the correct file name via the cabal file or command line options.)

RyanGlScott commented 8 years ago

Yeah, it is a bit curious that I have to rename the DLLs, since I checked the included .lib files and they definitely refer to *{55,56}.dll in their symbol tables. In fact, if you compile this program:

-- TextICU.hs
import Data.Text.ICU.Break

main = print Kana

with ghc TextICU.hs and then run cygcheck ./TextICU.hs without having the ICU4C bin64 folder in your PATH, it appears to search for the right DLLs:

$ cygcheck ./TextICU.hs
...
cygcheck: track_down: could not find icuin56.dll

cygcheck: track_down: could not find icuuc56.dll

But actually running the code reveals a different story:

$ runghc TextICU.hs
TextICU.hs: icuuc: The specified module could not be found.
TextICU.hs: <command line>: can't load .so/.DLL for: icuuc.dll (addDLL: could not load DLL)

So DLL lookup could be dependent on what the linked libraries' names are. To test this, I recompiled text-icu with the following changes to text-icu.cabal:

extra-libraries: icuuc56
if os(mingw32)
  extra-libraries: icuin56 icudt56

I also made symlinks named icuuc56.lib, icuin56.lib, and icudt56.lib that refer to icuuc.lib, icuin.lib, and licudt.lib, respectively, in the ICU4C lib64 directory. Now cygcheck and ghc agree on which DLLs to use:

$ ghc TextICU.hs
$ cygcheck ./TextICU.exe
...
cygcheck: track_down: could not find icuin56.dll

cygcheck: track_down: could not find icuuc56.dll
$ runghc TextICU.hs
TextICU.hs: icuuc56: The specified module could not be found.
TextICU.hs: <command line>: can't load .so/.DLL for: icuuc56.dll (addDLL: could not load DLL)

So linking the libraries under different names worked, but the question remains: why doesn't GHC use the symbol table in the .lib table to determine the DLL names? I don't know the answer, but here is the code that the GHC runtime (Linker.c) uses for DLL lookup on Windows:

const char *
addDLL( pathchar *dll_name )
{
   pathchar*      buf;
   OpenedDLL* o_dll;
   HINSTANCE  instance;

   for (o_dll = opened_dlls; o_dll != NULL; o_dll = o_dll->next) {
      if (0 == pathcmp(o_dll->name, dll_name))
         return NULL;
   }

   size_t bufsize = pathlen(dll_name) + 10;
   buf = stgMallocBytes(bufsize * sizeof(wchar_t), "addDLL");
   snwprintf(buf, bufsize, L"%s.DLL", dll_name);
   instance = LoadLibraryW(buf);
   if (instance == NULL) {
       if (GetLastError() != ERROR_MOD_NOT_FOUND) goto error;
       // KAA: allow loading of drivers (like winspool.drv)
       snwprintf(buf, bufsize, L"%s.DRV", dll_name);
       instance = LoadLibraryW(buf);
       if (instance == NULL) {
           if (GetLastError() != ERROR_MOD_NOT_FOUND) goto error;
           // #1883: allow loading of unix-style libfoo.dll DLLs
           snwprintf(buf, bufsize, L"lib%s.DLL", dll_name);
           instance = LoadLibraryW(buf);
           if (instance == NULL) {
               goto error;
           }
       }
   }
   stgFree(buf);

   addDLLHandle(dll_name, instance);

   return NULL;
}

I don't have the patience to figure out what arguments can be passed to addDLL, so perhaps someone familiar with this part of the code (@simonmar or @Mistuke, maybe?) could explain what is going on here?

Mistuke commented 8 years ago

Hi,

I'm not 100% sure, but I think the answer is just simply: it's not implemented.

The normal compilation via GHC uses the system linker (ld) and that supports the import libraries so it ends up looking for the right DLL. The runtime linker just tries to load the value directly as received from -l and never tries looking for an import library as you can see by the addDLL code.

I'm afraid I don't really have a good workaround for this aside from changing text-icu.cabal as you've done above.

Could you report this as a feature request to https://ghc.haskell.org/trac/ghc/ and link here? I'll try to get it in for 8.0 but I'm not sure I'll make it before the code freeze.

RyanGlScott commented 8 years ago

Opened Trac #11072.

ygale commented 8 years ago

Work on the GHC feature is proceeding in phab.

Mistuke commented 8 years ago

Sorry about the delay, It's implemented but I need to figure out an illegal instruction which is not introduced by this change but seems to trigger it. https://phabricator.haskell.org/D1696 I think https://phabricator.haskell.org/D1805 might solve it and will finish that before I proceed with this one.

Mistuke commented 8 years ago

This is now fixed in HEAD under https://github.com/ghc/ghc/commit/97f2b16483aae28dc8fd60b6d2e1e283618f2390