haskell / happy

The Happy parser generator for Haskell
Other
276 stars 84 forks source link

hGetContent says invalid byte sequence on Windows, non-English locale #157

Open ice1000 opened 4 years ago

ice1000 commented 4 years ago

Before doing this readFile:

https://github.com/simonmar/happy/blob/27596ff0ce0171d485bf96d38943ffc760923c90/src/Main.lhs#L72-L74

we may do hSetEncoding h IO.utf8 before.

See https://github.com/agda/agda/issues/4161#issuecomment-548085906

andreasabel commented 4 years ago

You could submit a PR, it is probably easier for you to test the change as you have the right context (Windows, non-English locale).

lehins commented 4 years ago

This applies to non-Windows systems as well. If your locale is not set or set to something like LANG=ascii, reading files with unicode in them will result in this error. In fact we received this error on Linux machine as well with this happy file: https://github.com/erikd/language-javascript/blob/eef1887d430c18b108ff723479c3f1ef50c0e9b2/src/Language/JavaScript/Parser/Grammar7.y

I fixed an issue exactly like this one with hpc: https://gitlab.haskell.org/ghc/ghc/issues/17073

Same fix could be applied here. Haskell source files are always assumed to be encoded in utf-8, same principal could be applied to happy .y files.

hdgarrood commented 4 years ago

This has caused problems for people trying to build the purescript compiler from source too: eg https://github.com/purescript/purescript/issues/3813, https://github.com/erikd/language-javascript/issues/86. I think having happy always assume that .y files are UTF-8 encoded would indeed be a good option.

thatwist commented 1 year ago

struggled with this.. this combo worked for me

echo "LC_CTYPE=\"en_US.UTF-8\"" | sudo tee -a /etc/default/locale
echo "LC_ALL=\"en_US.UTF-8\"" | sudo tee -a /etc/default/locale
echo "LANG=\"en_US.UTF-8\"" | sudo tee -a /etc/default/locale
echo "LC_ALL=en_US.UTF-8" | sudo tee -a /etc/environment
echo "en_US.UTF-8 UTF-8" | sudo tee -a /etc/locale.gen
echo "LANG=en_US.UTF-8" | sudo tee -a /etc/locale.conf
sudo locale-gen en_US.UTF-8