haskell / cabal

Official upstream development repository for Cabal and cabal-install
https://haskell.org/cabal
Other
1.62k stars 691 forks source link

Haddock failure with basement on 3.10.1.0, not with 3.8.1.0 #9266

Open liamzee opened 1 year ago

liamzee commented 1 year ago

Describe the bug

basement-0.0.16 fails to build, and it seems to cascade down (or be a separate bug)

As far as what I can tell, this is related to withMetaData, but might be related to other factors.

see pastebin for logs:

https://pastebin.com/JrApskCS

To Reproduce Steps to reproduce the behavior:

cabal haddock a library that uses crypton-0.33 under 3.10.1.0

$ cabal haddock -v

Expected behavior That it doesn't fail to build haddock documentation.

System information

Additional context None, I can get it working on cabal 3.8.1.0

Bodigrim commented 1 year ago

The actual failure seems to be

<no location info>: error:
    <stdout>: commitBuffer: invalid argument (cannot encode character '\66376')
ghci> putStrLn "\66376"
𐍈

So the offending sequence come from Basement.UTF8.Types:

-- For example:
-- 'A' => U+0041  => 41          => 0x00000041
-- '€  => U+20AC  => E2 82 AC    => 0x00AC82E2
-- '𐍈' => U+10348 => F0 90 8D 88 => 0x888D90F0
--
newtype CharUTF8 = CharUTF8 Word32

Hackage renders 𐍈 correctly: https://hackage.haskell.org/package/basement-0.0.16/docs/Basement-Types-CharUTF8.html#t:CharUTF8

I cannot reproduce the issue on my machine (which is macOS), but the common reason for "cannot encode character" is misconfigured system locale. @liamzee what does locale say on your machine?

$ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL="en_US.UTF-8"
liamzee commented 1 year ago

@Bodigrim

The problem is that this seems to work on 3.8.1.0, whereas it doesn't on 3.10.1.0.

locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
liamzee commented 1 year ago

Followed instructions here:

https://wiki.archlinux.org/title/locale

This seems to work now, but given the regression, is this still an issue with Cabal, or just Cabal fixing things?

Bodigrim commented 1 year ago
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory

This does not look right. I'm not sure how to fix it on Arch, maybe https://bbs.archlinux.org/viewtopic.php?id=257376?

Can you start ghci and execute putStrLn "\66376"? Does it print a symbol correctly?

liamzee commented 1 year ago

So, the issue has been fixed on my side, tested with cabal 3.10.1.0, and it works.

Still, it's a regression, is it Cabal's problem, basement's problem, or my problem?

Bodigrim commented 1 year ago

Neither :) Most likely it's a Haddock regression: maybe it stopped escaping Unicode symbols and now assumes that user's locale is Unicode-aware. In such case Haddock should set locale of file handles to UTF-8 explicitly.

liamzee commented 1 year ago

I'm not familiar with what's going on, i.e, does cabal maintain Haddock as a dependency, and the upgrade from 3.8.1.0 to 3.10.1.0 involved an updated version of Haddock?

If so, should I pass on the bug to haddock team?

Bodigrim commented 1 year ago

OTOH Haddock cannot set locale to UTF-8 if it's not installed on your system... A proper solution for Haddock would be to use Data.Text.IO.Utf8 or go via bytestring, which is locale-independent.

Normally Haddock comes from GHC distribution, so switching Cabal version should not affect it. Dunno, maybe Cabal has something to do with it after all. @coot could this be related to your work on haddock-project?

I'm not a maintainer here, just passing by.

coot commented 1 year ago

could this be related to your work on haddock-project?

Unlikely, haddock-project, calls haddock:

  1. to create haddocks (through cabal haddock),
  2. to create indexes.
liamzee commented 1 year ago

Do you need me to attempt to nuke my locale? Sort of busy right now and can't afford a restart, but nuking the locale is probably the best way to reproduce.

Mikolaj commented 10 months ago

Is this still reproducible with newest haddock/ghc/cabal 3.10.2? Any progress diagnosing?