haskell / zlib

Compression and decompression in the gzip and zlib formats
http://hackage.haskell.org/package/zlib
35 stars 32 forks source link

Runtime failure on Windows on version 0.7.0.0 #65

Closed L0neGamer closed 6 months ago

L0neGamer commented 7 months ago

To observe the issue, do the following on Windows (having installed GHC 9.2.8+).

Have an example.cabal with the following contents:

cabal-version: 3.8
name: example
version: 0.1
executable example
  build-depends: base, zlib == 0.7.0.0
  main-is: Main.hs

Have a file Main.hs with the following contents:

module Main where
import Codec.Compression.Zlib.Raw
main = do
    putStrLn "Test"
c = compress

run cabal build, then cabal exec example. For some reason, Test is not printed to stdout. Running echo $lastexitcode shows that the exit code given is -1073741701, which from a quick google is typically related to incorrect linkings. Note that this is a runtime failure, not a build failure.

Changing the zlib version to 0.6.3.0 (which is the previous version) means that this program works.

This is probably related to Do not force bundled-c-zlib on Windows, but force it for WASM. in the previous release, if I had to guess.

This error arose when similar code was written using a library massively downstream of zlib (discord-haskell, with code as below). This is even more surprising, since I'm pretty sure that restCall shouldn't directly reference compress or similar values

module Main where

import Discord

main :: IO ()
main = do
    putStrLn "Test"

a :: (Request (r a), FromJSON a) => r a -> DiscordHandler (Either RestCallErrorCode a)
a = restCall

Other notes include is that my Windows haskell setup is entirely fresh and made specifically to test this out, so it's unlikely to be an issue with my machine (also considering that someone else brought this issue to me).

Bodigrim commented 7 months ago

@L0neGamer is it possible to reproduce the issue with other versions of GHC, newer than 9.2.8?

We have a CI job for Windows + GHC 9.2.8 which seems to succeed, so I'm at loss what's up.

L0neGamer commented 7 months ago

Sorry I wasn't clear, when I said 9.2.8+ I meant that version and onwards. Also tested on .4.8 and .6.4

Bodigrim commented 7 months ago

That's very weird. Can you contribute a reproducer expressed as a CI job?

Bodigrim commented 7 months ago

Also, what's the Cabal version you are using?

L0neGamer commented 7 months ago

cabal --version -> 3.10.2.1

I'm not sure how I'd do the CI job thing but I can try look into it? It'd probably be best for someone else to though.

L0neGamer commented 7 months ago

Looking at the CI jobs, the only two relating to windows I can immediately see is one that builds and one that runs with bundled-c-zlib enabled, which is likely the issue here.

Can confirm that running cabal run -c 'zlib +bundled-c-zlib' results in the correct behaviour (that is, Test prints).

Bodigrim commented 7 months ago

Well, but the job without bundled-c-zlib also succeeds in CI environment, right? If it runs tests, it means that it linked successfully.

L0neGamer commented 7 months ago

True. I don't know enough how this stuff works or what the windows environment looks like; if you've reading material or a suggestion of where to read up I can have a go at some stage.

Bodigrim commented 7 months ago

The thing is that zlib links fine on a Windows machine I have access to. So I cannot investigate any further without a portable reproducer.

It might be worth to raise the issue at https://gitlab.haskell.org/ghc/ghc/-/issues: it's GHC's responsibility to link correctly (or abort compilation if it's impossible to do so).

L0neGamer commented 7 months ago

I'll look into raising it over there soon; at the very least maybe I'll be able to get a reproducer for here from them.

fendor commented 7 months ago

I was also bitten by this on my windows 10 machine. I was able to reproduce the issue while building cabal HEAD. GHC 9.4.8 and cabal 3.10.2.1.

Bodigrim commented 7 months ago

@fendor please give me a reproducer in a form of CI job.

fendor commented 7 months ago

No windows runner supported by github (it is just windows-2019 and windows-2022) seems to be able to reproduce the issue right now.

Bodigrim commented 7 months ago

@fendor you can also try flipping pkg-config flag: I suspect GHA runners are likely to have it pre-installed, but your local environment probably does not.

Otherwise file a GHC issue please.

fendor commented 7 months ago

With the pkg-config flag:

$ cabal repl exes --constraint="zlib +pkg-config"                                                                                                                                                                                                                                                           Resolving dependencies...
Error: cabal-3.10.2.1.exe: Could not resolve dependencies:
[__0] trying: zlib-ghc-windows-0.1 (user goal)
[__1] trying: zlib-0.7.0.0 (dependency of zlib-ghc-windows)
[__2] trying: zlib:-bundled-c-zlib
[__3] rejecting: zlib:+pkg-config (conflict: pkg-config package zlib-any, not
found in the pkg-config database)
[__3] rejecting: zlib:-pkg-config (constraint from command line flag requires
opposite flag selection)
[__3] fail (backjumping, conflict set: zlib, zlib:bundled-c-zlib,
zlib:pkg-config)
After searching the rest of the dependency tree exhaustively, these were the
goals I've had most trouble fulfilling: zlib, zlib-ghc-windows,
zlib:bundled-c-zlib, zlib:pkg-config
Try running with --minimize-conflict-set to improve the error message.    

I will file a ghc issue either way.

Bodigrim commented 7 months ago

@fendor I think ultimately it's either Cabal or GHC responsibility: if extra-libraries: z is not available or is no good to link with, they should tell so loudly instead of producing segfaulting artefacts.

fendor commented 7 months ago

I agree. I am looking into it a little bit.

fendor commented 7 months ago

Tracking this issue in ghc: https://gitlab.haskell.org/ghc/ghc/-/issues/24531

andreasabel commented 6 months ago

Related:

mpilgrem commented 6 months ago

From recollection, MSYS2 does not come with pkg-config.exe by default and you have to manually install https://packages.msys2.org/package/mingw-w64-x86_64-pkgconf. EDIT: Recollection confirmed with a fresh Stack-supplied MSYS2:

❯ stack exec -- where.exe pkg-config
INFO: Could not find files for the given pattern(s).
Bodigrim commented 6 months ago

@andreasabel I think that one is an orthogonal, Stack-specific issue, not quite related to the error here (which is that pkg-config exists, advertises zlib C library as available, zlib C library is advertised as available, but linking fails eventually).

mpilgrem commented 6 months ago

@Bodigrim, this may be off-topic for this particular issue (EDIT: perhaps on topic for https://github.com/haskell/zlib/issues/64), but why has zlib-0.7 chosen to make the default for its Cabal flag pkg-config true on Windows? If I set the flag to false, zlib-0.7 works fine 'out of the box' on Windows.

The problem I have is: if I have a dependency on zlib (as I do in stack.cabal), and I am using Windows, how do I specify that its pkg-config Cabal flag needs to be set to false? I don't think you can do that with Cabal, and Stack's flags configuration option is not conditional on operating system. Is the only solution to set the pkg-config flag to false for all operating systems (EDIT: that is, using Stack's flags configuration option)?

Bodigrim commented 6 months ago

The problem I have is: if I have a dependency on zlib (as I do in stack.cabal), and I am using Windows, how do I specify that its pkg-config Cabal flag needs to be set to false? I don't think you can do that with Cabal, and Stack's flags configuration option is not conditional on operating system.

pkg-config is an automatic flag and Cabal is happy to solve it depending on environment, so normally there is nothing to specify. Even if it was not automatic, cabal.project supports conditions based on OS.

As I said https://github.com/commercialhaskell/stack/issues/6557, Stackage snapshots should set pkg-config to false uniformly, yes.

RyanGlScott commented 6 months ago

As noted in https://gitlab.haskell.org/ghc/ghc/-/issues/24531#note_559785, the situation on Windows is a little complicated. GHC always links against <ghc-install-dir>/mingw/lib, as this contains libraries that are needed for GHC's RTS (among other things). However, this library also contains libz.dll.a, an import library that tells GHC to dynamically load the zlib1.dll shared library at runtime. As far as the linker is concerned, the presence of libz.dll.a at link time means that everything is working as expected.

Where things go wrong is when you actually run the executable. Due to how dynamic linking works on Windows, the loader can't know ahead of time where zlib1.dll is (there are no rpaths on Windows), so the loader instead searches your PATH for zlib1.dll. There is a zlib1.dll file located in <ghc-install-dir>/mingw/bin, but most users won't have that on their PATH (and it's unclear if that would be advisable in general). Therefore, the executable will fail at runtime when it can't find zlib1.dll.

Many GHC users also have MinGW-w64 installed (via MSYS2), and when you run something in an MSYS2 shell, it will add a directory to your PATH that contains another copy of zlib1.dll. As such, this issue may not occur for you locally if you are running in MSYS2. If that is the case, try running the same commands in PowerShell (and make sure that you didn't add any MSYS2 directories to your PATH).

RyanGlScott commented 6 months ago

Having said all of that, it's unclear to me what can be done about this on the GHC side. I am not a Windows GHC expert, so I presume that there is a good reason for including libz.dll.a in <ghc-install-dir>/mingw/lib, but it does have the unfortunate side effect of messing with .cabal files that depend on extra-libraries: z.

A workaround would be to compile zlib using the bundled-c-zlib or pkg-config flags. I wonder if bundled-c-zlib should be the default on Windows until we figure out how to resolve https://gitlab.haskell.org/ghc/ghc/-/issues/24531.

Bodigrim commented 6 months ago

However, this library also contains libz.dll.a, an import library that tells GHC to dynamically load the zlib1.dll shared library at runtime.

@RyanGlScott is there any way to force static linking? Or is libz.dll.a only dynamically-linkable?

it does have the unfortunate side effect of messing with .cabal files that depend on extra-libraries: z.

Is my understanding correct that we can never trust extra-libraries: z, because we do not know whether it is a static or dynamic library?

RyanGlScott commented 6 months ago

is there any way to force static linking?

In principle, yes, although I haven't managed to figure out its quirks. GHC accepts the -l:libXYZ.a syntax, which instructs the linker to link against a specific file. With this, you can tell GHC to link against libz.a (a static archive) instead of defaulting to libz.dll.a import library (which is what would happen if you passed -lz).

That being said, this appears to be somewhat buggy in practice. I tried modifying zlib.cabal like so:

diff --git a/zlib.cabal b/zlib.cabal
index 24e2595..22aff8b 100644
--- a/zlib.cabal
+++ b/zlib.cabal
@@ -118,7 +118,7 @@ library
       pkgconfig-depends: zlib
     else
       -- On Windows zlib is shipped with GHC starting from 7.10
-      extra-libraries: z
+      extra-libraries: :libz.a

 test-suite tests
   type: exitcode-stdio-1.0

But that fails with a different linker error when building the executable:

Building executable 'example' for zlib-ghc-windows-0.1..
[2 of 2] Linking C:\\Users\\winferno\\Documents\\Hacking\\Haskell\\zlib-ghc-windows-65\\dist-newstyle\\build\\x86_64-windows\\ghc-9.4.8\\zlib-ghc-windows-0.1\\x\\example\\build\\example\\example.exe
ld.lld: warning: ignoring unknown argument: -exclude-symbols:zcalloc
ld.lld: warning: ignoring unknown argument: -exclude-symbols:zcfree
ld.lld: error: -exclude-symbols:zcalloc is not allowed in .drectve
ld.lld: error: -exclude-symbols:zcfree is not allowed in .drectve
ld.lld: warning: ignoring unknown argument: -exclude-symbols:_tr_init
ld.lld: warning: ignoring unknown argument: -exclude-symbols:_tr_stored_block
ld.lld: warning: ignoring unknown argument: -exclude-symbols:_tr_flush_bits
ld.lld: warning: ignoring unknown argument: -exclude-symbols:_tr_align
ld.lld: warning: ignoring unknown argument: -exclude-symbols:_tr_flush_block
ld.lld: warning: ignoring unknown argument: -exclude-symbols:_tr_tally
ld.lld: warning: ignoring unknown argument: -exclude-symbols:_dist_code
ld.lld: warning: ignoring unknown argument: -exclude-symbols:_length_code
ld.lld: error: -exclude-symbols:_tr_init is not allowed in .drectve
ld.lld: error: -exclude-symbols:_tr_stored_block is not allowed in .drectve
ld.lld: error: -exclude-symbols:_tr_flush_bits is not allowed in .drectve
ld.lld: error: -exclude-symbols:_tr_align is not allowed in .drectve
ld.lld: error: -exclude-symbols:_tr_flush_block is not allowed in .drectve
ld.lld: error: -exclude-symbols:_tr_tally is not allowed in .drectve
ld.lld: error: -exclude-symbols:_dist_code is not allowed in .drectve
ld.lld: error: -exclude-symbols:_length_code is not allowed in .drectve
ld.lld: warning: ignoring unknown argument: -exclude-symbols:inflate_table
ld.lld: error: -exclude-symbols:inflate_table is not allowed in .drectve
ld.lld: warning: ignoring unknown argument: -exclude-symbols:inflate_fast
ld.lld: error: -exclude-symbols:inflate_fast is not allowed in .drectve
clang: error: linker command failed with exit code 1 (use -v to see invocation)
ghc-9.4.8.exe: `clang.exe' failed in phase `Linker'. (Exit code: 1)

Is my understanding correct that we can never trust extra-libraries: z, because we do not know whether it is a static or dynamic library?

The issue isn't really static vs. dynamic libraries, but rather dynamic libraries that are on your runtime search path (e.g., MinGW-w64 libraries) versus ones that aren't (e.g., libraries that are bundled with GHC). Using a dynamically linked libz is perfectly fine provided that the dyanamic loader knows where it is at runtime, and this is precisely why the pkg-config option works most of the time.

mpilgrem commented 6 months ago

On Windows, in the Stack environment, I think the GHC-supplied zlib1.dll is always on the PATH (and first on the PATH). For example, on my system:

❯ stack --snapshot ghc-9.6.5 exec -- where.exe zlib*
C:\Users\mike\AppData\Local\Programs\stack\x86_64-windows\ghc-9.6.5\mingw\bin\zlib1.dll
C:\Program Files\gnuplot\bin\zlib1.dll
C:\Program Files (x86)\gnupg\bin\zlib1.dll
C:\Program Files\Inkscape\bin\zlib1.dll

As indicated above, a number of applications that I use put a copy of zlib1.dll on the PATH. In the past, outside of the Stack environment, I have had problems with Haskell code picking up an out-of-date version of zlib1.dll on the PATH (fixed by replacing it with an up-to-date version).

Bodigrim commented 6 months ago

Thanks for the investigation @RyanGlScott!