Open Blaisorblade opened 10 years ago
Are you running on a non-UTF-8 system? For Java point-of-view, about all operating system, except correctly configured Linux machines, are using an encoding different than UTF-8. It is relevant because the execution of the worksheet code is done in a forked process, and it is likely that the encoding is not forced to UTF-8.
Thanks for the prompt answer! I assumed this would be a problem when decoding from the stream, but you might still be right.
Do you agree that using the host configuration would be a bug?
I investigated a bit, and before answering your question, I'll give my analysis: Eclipse is correctly configured to use UTF-8 (according to this: http://stackoverflow.com/a/9181068/53974), and that should be enough. Instead, I also need to set -Dfile.encoding=UTF8
in eclipse.ini
, and the worksheet works correctly if and only if that option is active. (When relaunching Eclipse, I also need to modify & save the worksheet to update the output).
Analysis: Since the documented setting is inside Eclipse itself, it seems that what I'm doing is a hack, needed because some code uses the default encoding instead of passing the Eclipse-configured one. Now, I don't envy the poor soul who's supposed to debug this (you forget to thread the encoding once and you have a bug), even though I suppose those needed for people configuring multiple encodings. So I'll be OK with any resolution other than not "not-a-bug" — for instance, I'd be happy with WontFix or a late milestone/low priority, as long as the workaround is documented.
Side note/additional issue: line breaking seems very much not Unicode-aware, both in practice:
val test1T: Term = test1 //> test1T : ilc.feature.let.ANormalFormTest.v.Term = App(Abs(Var(id,((ℤ →
//| ℤ) → ℤ → ℤ) → (ℤ → ℤ) → ℤ → ℤ),App(Abs(Var(id_i,�
//| � → ℤ),App(Abs(Var(apply,(ℤ → ℤ) → ℤ → ℤ),App(App(App(Var(
And maybe happens because this implementation is in terms of bytes — it adds newlines after a certain byte count, but I didn't run anything with debugging:
Are you running on a non-UTF-8 system?
As far as I can tell, no. I'd be happy to try a test of your choice.
I'm using OS X 10.9, but almost everything else on my system is handling Unicode correctly. I say "almost" because IIRC some programs (TextEdit) still dare offer me "Mac OS Roman" as default encoding.
Regarding -Dfile.encoding=UTF8
, most of my JVMs have that option (according to jvisualvm). Eclipse didn't, but still, both in the Scala REPL and in the worksheet, the property seems correctly set. However, setting -Dfile.encoding made a difference, not sure why.
Scala REPL, both inside and outside Eclipse, and
scala> sys.props("file.encoding")
res4: String = UTF-8
sys.props("file.encoding") //> res0: String = UTF-8
Also, from the prompt:
$ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
Finally, I run this program:
package charset;
public class TestCharset {
public static void main(String[] args) {
System.out.println(System.getProperty("file.encoding"));
}
}
and got this output:
$ java charset.TestCharset
UTF-8
So the default encoding seems to be the right one. But I must be missing something, since -Dfile.encoding=UTF8 made a difference for Eclipse.
I am also getting this issue on a UTF-8 system. All files are correctly configured to use UTF-8. The line splitting in the worksheet messes up the output.
My Scala code (a lambda-calculus implementation) produces UTF-8 output. The worksheet is exactly what I'd want, except that it doesn't cope with UTF-8 program output. The whole project is using UTF-8 as far as I can tell, as the workspace is.
For instance, compare an output fragment, as seen by running the Scala REPL inside Eclipse: ((ℤ → ℤ) → ℤ → ℤ) → (ℤ → ℤ) → ℤ → ℤ) with what I get in the Worksheet:
((��� ��� ���) ��� ��� ��� ���) ��� (��� ��� ���) ��� ��� ��� ���)
Each Unicode character translates to three question marks because all these characters take 3 bytes in UTF-8 (because they're outside the BMP).
This is with version 3.0.4 of Scala IDE. More precisely: Scala Worksheet 0.2.3.v-2_11-201405200954-4f7988d org.scalaide.worksheet.feature.feature.group Scala IDE Scala IDE for Eclipse 3.0.4.v-2_11-201405200946-c46f499 org.scala-ide.sdt.feature.feature.group scala-ide.org
(Plus Scala Search & ScalaTest plugins, I could provide those version numbers if needed).
I've looked at the current source code (which maybe was a bad idea), and it seems that the conversion should be done purely by Eclipse libraries here, and I can't see anything wrong with that:
https://github.com/scala-ide/scala-worksheet/blob/0281642ce05e2420e72fbf1e7551f945c16b811d/org.scalaide.worksheet/src/org/scalaide/worksheet/runtime/ProgramExecutor.scala#L141