haskell / cabal

Official upstream development repository for Cabal and cabal-install
https://haskell.org/cabal
Other
1.62k stars 697 forks source link

Corrupt file header when dealing with Unicode normalization issues #4920

Open snoyberg opened 6 years ago

snoyberg commented 6 years ago

First, the repro, then the background. This likely only repros on OS X (explained below). Place the following files in a directory:

Setup.hs:

import Distribution.Simple
main = defaultMain

package.cabal:

name:                ば日本-4本
version:             0.1.0.0
build-type:          Simple
cabal-version:       >=1.10

library
  exposed-modules:     Lib
  build-depends:       base >= 4.7 && < 5
  default-language:    Haskell2010

Then run the following series of commands:

bash-4.4$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 8.2.2
bash-4.4$ ghc-pkg list Cabal
/Users/michael/.stack/programs/x86_64-osx/ghc-8.2.2/lib/ghc-8.2.2/package.conf.d
    Cabal-2.0.1.0
bash-4.4$ runghc Setup.hs configure --user
Configuring ば日本-4本-0.1.0.0...
bash-4.4$ runghc Setup.hs build
Saved package config file header is corrupt. Re-run the 'configure' command.

Expected: Builds the package correctly

Actual: complains repeatedly about corrupt file header. Re-running 'configure' does not help.

Background

This popped up when debugging a failing integration test in Stack. It turns out that this specific name for a package has a long history on the Stack side, since (on OS X) it appears that some Unicode normalization is applied to filenames, therefore making the sequence of code points stored in the cabal file mismatch the sequence returned by the OS from the generated file name. For a lot more information, see these issues:

I'm guessing that a similar file name codepoint modification is occurring inside the dist directory.

hvr commented 6 years ago

Sounds like a duplicate of #2557 to me

Blaisorblade commented 6 years ago

It looks like a dup indeed, apparently not limited to OS X. Please do test on OSX though; since NFD filename normalization might cause similar problems to what we’ve seen in Stack, and https://github.com/commercialhaskell/stack/issues/1337 shows at least some would like Unicode package names.

hvr commented 6 years ago

@Blaisorblade Well, I'm very keen on getting Cabal Unicode-proper, to the extent that the underlying operating systems allows this... but once I fix #2557 I'll clearly have to check whether less transparent OS fileystems APIs such as Win32 or OSX need some OS-specific quirks... :-)

Nolrai commented 5 years ago

@hvr Did you look at this after fixing #2557?