ndmitchell / shake

Shake build system
http://shakebuild.com
Other
772 stars 118 forks source link

Shake cannot handle stdout with unicode characters #773

Open ajrouvoet opened 4 years ago

ajrouvoet commented 4 years ago

I get the following error

* Raised the exception:
fd:14: hGetContents: invalid argument (invalid byte sequence)

while trying to capture the stdout of a command that contains unicode characters: (Exit code, Stdout out) <- cmd someCommand.

When I do not capture Stdout (by dropping it on the LHS), it works as intended.

typetetris commented 4 years ago

What is your locale setting? This function should interpret the input as an encoding defined by your locale.

ndmitchell commented 4 years ago

Locale's and what not are all very messed up. Relying on default locales is awful. Not relying on them causes issues too. My suggestion if you are working with Unicode is to pass the Binary option to the process, collect it as a ByteString, and apply whatever conversion you know is active. I'd usually go for UTF8 and pass a flag to the command such as --utf8 (it's very command dependent). That gives the most reliable behaviour.

We should document this in the Haddock docs (it's a very reasonable thing to want to do)

ajrouvoet commented 4 years ago

@typetetris My locale is _.UTF-8

@ndmitchell Thanks for the advice. That sounds reasonable. It would also have been much nicer if it was clearer what caused the error. I had no idea where this was coming from. I was also calling a haskell binary, so initially I thought it was going wrong there.

To end this on a positive note: loving shake! My ticket out of makefile hell. Is there a place where I can a non-bug question about dependencies?

ndmitchell commented 4 years ago

Unfortunately hGetContents is lazy, so its very hard for Shake to capture and put a better error message on top of it. Writing it in the docs is the best I've got. With luck, if you'd use ByteString you would have got a better error message - but you also wouldn't have got a UTF-8 error in the first place.

Glad you are enjoying Shake - for non-bug questions StackOverflow is usually the best place as then they are more useful for people who come after: https://stackoverflow.com/questions/tagged/shake-build-system

But there's a mailing list and GitHub if that's not an option. I'm not too worried about non-bug things on GitHub.

ajrouvoet commented 4 years ago

Ah I see, yes that is annoying.

Another thing that would have helped me a lot is a brief description of how Shake determines which targets are out-of-date (I hope I didn't miss it). My best approximation at the moment is that it determines this by checking if the immediate dependencies that were recorded on the last run are changed. This then immediately explains why changing the rules, adding more dependencies, or changed dynamically computed dependencies, may not cause a rebuild. I saw some questions about this both on here and stackoverflow.

ndmitchell commented 4 years ago

Sorry, I've been unavailable the last couple of weeks.

That's a pretty good description. I'm sure that's in the docs somewhere that changing a rule without changing something like shakeVersion doesn't work well. But I have no idea where. Where might you expect it? These kind of fundamental truths about how the build system works are pretty hard to capture since they don't belong on any one function, and its hard to find something general that everyone reads.

ajrouvoet commented 4 years ago

No problem!

I went by the 'tutorial'/'manual' on the shake website https://shakebuild.com/manual. There might be a fine place to explain such concepts. Similarly the benefits and limitations of Shake being a backward build system (is that how you call non-forward build systems?) might be explained there.

(Side track: I wrote a very positive review of Shake here)

soiamsoNG commented 3 years ago

I encounter this problem and debug and solve as below

locale charmap this result will use by ghci as default in its whole session, at first it return below complain

locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
ANSI_X3.4-1968

locale -a this will show all the encoding support by your enviroment, we just can set the LANG to one of showing in the result if get Cannot Set ... to default, the locale set is incorrect, in such condition iconv at the back of GHC will fallback to use ASCII encoding

locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_COLLATE to default locale: No such file or directory
C
C.UTF-8
POSIX

After set LANG=C.UTF-8 locale charmap return UTF-8 without complain

ndmitchell commented 3 years ago

Thanks for your review of Shake @ajrouvoet - very nice. I agree there's something good to be added to the user manual, so I'll leave this ticket open to track that addition, as well as more notes on the unicode thing as per @soiamsoNG (thanks for those).

hsyl20 commented 3 years ago

GHC has also been bitten by this (see https://gitlab.haskell.org/ghc/ghc/-/issues/19469).

@ndmitchell Perhaps Shake should use Binary i/o by default?

soiamsoNG commented 3 years ago

GHC has also been bitten by this (see https://gitlab.haskell.org/ghc/ghc/-/issues/19469).

@ndmitchell Perhaps Shake should use Binary i/o by default?

I think it is a Locale incorrect setup issue, e.g. to fix this docker haskell now set C.UTF-8 as default. The issue you mention just in Windows,i think need to check MSYS2/mingW setting first,to avoid further incorrect setup GHC`s hThings...(any h start function will fall back to ASCII if incorrect setup).