different UTF-8 output when compiled

GoogleCodeExporter commented 8 years ago

What steps will reproduce the problem?

% cat bar.pure
using system;
puts $ str $ chars "abcde";
puts $ str $ chars "あいうえお";
% pure --version
Pure 0.44 (i686-pc-linux-gnu) Copyright (c) 2008-2010 by Albert Graef
Compiled for LLVM 2.7 (http://llvm.org)
% pure -c bar.pure
["a","b","c","d","e"]
["あ","い","う","え","お"]
% ./a.out
["a","b","c","d","e"]
["\0x3042","\0x3044","\0x3046","\0x3048","\0x304a"]
%

What is the expected output? What do you see instead?

Identical outputs are expected whether the program is compiled or not.
But actually they differ.

What version of the product are you using? On what operating system?

Pure r3458 on Fedora 12.

Please provide any additional information below.

"あいうえお" are Japanese phonograms, which are 0x3042 etc in UTF-8.

Original issue reported on code.google.com by echochamber on 19 Jun 2010 at 4:40

GoogleCodeExporter commented 8 years ago

That's right, a batch-compiled script (unlike the interpreter when it runs in 
interactive or batch mode) doesn't initialize the locale for you. That's why 
the string gets printed differently, since the Japanese characters are not 
considered printable by the expression printer in the C locale. (If you add the 
call setlocale LC_ALL "" at the beginning of your script, it does the right 
thing in either case.) Actually, I don't remember why it is that way, so I'll 
have to dig through the docs to see whether there is a reason for it. But I 
tend to agree that it's a misbehaviour.

Original comment by aggraef@gmail.com on 21 Jun 2010 at 8:42

Changed state: Accepted

GoogleCodeExporter commented 8 years ago

Fixed in r3466. Thanks for reporting!

Original comment by aggraef@gmail.com on 22 Jun 2010 at 4:29

Changed state: Fixed

hughperman / pure-lang

different UTF-8 output when compiled #38