haskell-suite / haskell-src-exts

Manipulating Haskell source: abstract syntax, lexer, parser, and pretty-printer
Other
193 stars 94 forks source link

Unicode is not prettyprinted correctly #443

Open DanBurton opened 4 years ago

DanBurton commented 4 years ago

See, for example: https://github.com/haskell-suite/haskell-src-exts/commit/94a3bcde453910f8133ad1a01e9e25c77ec64372

UnicodeIdents.hs

猫 = ()

UnicodeIdents.hs.prettyprinter.golden

+ = ()

This golden file is known to be incorrect.

obfusk commented 4 years ago

Looks like the unicode character gets truncated to 8 bits: ord '猫' is 0x732b, ord '+' is 0x2b.

Easy to do:

import qualified Data.ByteString.Char8 as BS
BS.pack "猫" -- => "+"

But I'm not sure why or where it's happening here.