Closed mrkkrp closed 5 years ago
I can't reproduce it anymore.
If anyone encounters this, please re-open.
This still seems to be a problem, just tested with the newest master. To reproduce:
-- main.hs
main :: IO ()
main = putStrLn "ä"
$ ormolu main.hs
$ ormolu: /home/.../main.hs: hGetContents: invalid argument (invalid byte sequence)
This is always caused by locale, not something in Ormolu.
Could we choose to always read files as Utf-8, regardless of the locale?
Please +1 this GHC ticket: https://gitlab.haskell.org/ghc/ghc/issues/17755 :)
After some digging around I found this nixpkgs issue.
I can confirm that on my system (Arch Linux, locale set to en_US.UTF-8) using either of the workarounds mentioned here makes ormolu work without problems on files containing Unicode characters.
So it would seem that ormolu works without workarounds in the following two cases:
If that's true, this issue would affect any non-NixOS user developing internationalized software.
I don't know what's the current state of affairs in GHC regarding to which encoding is used when interpreting source files. If they already switched to using UTF-8, would it make sense for ormolu to follow the same path?
I can confirm that this is still an issue, it manifests on our CI server. It should be noted that System.IO.readFile
is a thin wrapper around hGetContents
, so even though there is no explicit call to hGetContents
in the codebase the issue still easily manifests.
The simple solution is a wrapper readUtf8Contents
that explicitly sets the handle encoding to utf8 before reading. I can make a PR if the maintainers will confirm they would accept this solution.
@mrkkrp WDYT?
This is never a problem for me. I think as long as locale is selected correctly (e.g. with LANG
env variable) it should work fine. I'm not against a PR that would force UTF-8 though.
Can this be re-opened? This is definitely still an issue
In order to reopen this a way to reproduce the problem should be provided.
-- test.hs
a = "ℤ"
$ LOCALE_ARCHIVE= LC_ALL= ormolu test.hs
ormolu: test.hs: hGetContents: invalid argument (invalid byte sequence)
Edit: Ah I see your note about forcing UTF-8 now. I completely read over it.
The fact that locate influences how file contents are read is something that affects every program written in Haskell. So it looks like if this is so annoying, we should try to change the default behavior upstream, not try to patch it in every individual application again and again.
See https://gitlab.haskell.org/ghc/ghc/-/issues/17755, I'd love to see that issue moving forward.
When I try to feed source files with Unicode symbols I get the infamous "invalid byte sequence" errors. Investigate what causes this and fix it.
Low-priority for now.