scala-ide / scala-worksheet

A Scala IDE plugin for a multi-line REPL (called worksheet)
96 stars 24 forks source link

IO error while decoding <path to worksheet> with UTF-8 Please try specifying another one using the -encoding option #120

Closed megri closed 11 years ago

megri commented 11 years ago

If using Swedish characters like åäö in a worksheet, it won't compile.

I have tried adding both '-encoding UTF-8' and '-Dfile.encoding=UTF-8' to the Additional command line parameters-box under Preferences>Scala>Compiler to no avail.

My worksheet is encoded using UTF-8 without BOM.

Scala plugin version: 3.0.0.rc1-2_10-201302280900-dd11367

Scala compiler version: 2.10.1.v20130225-100037-RC2-4d1a1f7ee5 Scala library version: 2.10.1.v20130225-100037-RC2-4d1a1f7ee5 Eclipse version: 4.2.1.v201209141800

encoding-bug-worksheet

megri commented 11 years ago

Addendum: my original source is located @ ../one-shots/src/testing.sc and is indeed UTF-8. However, the file @ ../one-shots/.worksheets/src/testing.scala is plain ANSI. The encoding seems to be lost during pre-compilation.

dragos commented 11 years ago

Thank you for the precise diagnosis, that is indeed what happened. We'll fix it.

veeandy commented 11 years ago

any update on this?

dotta commented 11 years ago

any update on this?

@veeandy We are busy working on the Play2 Eclipse plug-in but, as usual, PRs are welcomed ;-)

dragos commented 11 years ago

Sorry for the delay, as @dotta mentioned, we are now focusing on improving the Play2 plugin. However, this is a fairly easy fix, and it would be cool to get some help from the community. The code is here and it even has a FIXME comment. To specify an encoding, it should be enough to change the FileWriter to a Writer that takes an encoding parameter (that should be retrieved from Eclipse settings, see ScalaProject.getEncoding in the Scala IDE.

megri commented 11 years ago

I think I may have a fix for this but I can't figure out how to build the project for testing..

dotta commented 11 years ago

I think I may have a fix for this

Great!

but I can't figure out how to build the project for testing..

I'll have a look and update the documentation if needed. I'll be back with more info soon.

dotta commented 11 years ago

@megri So, it turns out we need to do some clean-up in the POM and the documention (expect to have some news about this early next week).

For the moment, run the following command for building the worksheet:

mvn -P 2.9.x -P nightly-scala-ide-scala-2.9 -P indigo clean install

If that compiles fine, feel free to issue a PR.

dragos commented 11 years ago

See PR #123, it should simplify the build.

megri commented 11 years ago

Cool, I'll have a fix ready soon. It will be a bit dirty due to the state of ResidentCompiler.scala but should consider charset on by-file basis.

megri commented 11 years ago

I've come across a problem that needs discussion before continuing on.

My initial intention was to allow the worksheet to be compiled using whatever encoding the worksheet FILE was using, as opposed to project. This makes sense if you're working in a cross-platform/developer environment where file encodings may vary.

The problem is that the scalatools.nsc.Global-compiler locks down the encoding it's going to use after initialization. As the settings object is mutable this had me confused at first until I looked at line 315 in the source.

I see a couple of ways to proceed:

  1. resolve project encoding at startup, ignore file level encodings;
  2. replace the compiler with a new instance when the encoding of the source and the compiler mismatch;
  3. mutate the compiler instance's internal SourceReader by reflecting the crap out of it; or
  4. lazily launch and cache a new compiler instance for each encoding encountered

What do you think?

dotta commented 11 years ago

I've come across a problem that needs discussion before continuing on.

My initial intention was to allow the worksheet to be compiled using whatever encoding the worksheet FILE was using, as opposed to project. This makes sense if you're working in a cross-platform/developer environment where file encodings may vary.

The problem is that the scalatools.nsc.Global-compiler locks down the encoding it's going to use after initialization. > As the settings object is mutable this had me confused at first until I looked at line 315 in the source.

Good catch!

I see a couple of ways to proceed:

  1. resolve project encoding at startup, ignore file level encodings;

At the moment, this is the option I would go with; it's pragmatic, and hopefully the simplest one to implement. It would be convenient if an error is reported to the user when the opened worksheet file doesn't use the project's encoding.

  1. replace the compiler with a new instance when the encoding of the source and the compiler mismatch;

This would work. However, starting up a new compiler takes time. If the user has many worksheet sources that use different encodings I'm afraid it will get frustrated because of the waiting time, and it would end up blaming the tool.

  1. mutate the compiler instance's internal SourceReader by reflecting the crap out of it; or

That sounds scary :) I'm wondering if it would actually work. But we should probably try to stay away from black-magic wizardry.

  1. lazily launch and cache a new compiler instance for each encoding encountered

This would also work, but each compiler instance eats up quite some memory (the actual size depends on your classpath). Both Eclipse and Scala IDE are already eating up quite some memory by themselves, so I'd rather not create and cache a new compiler instance per worksheet/encoding :)

What do you think?

dragos commented 11 years ago

@megri, the problem is much simpler, I think. Here's what I see:

It looks like the instrumented source (under .worksheet/src) is saved using the default encoding. So, it seems to me, the only encoding that's missing is for writing the instrumented code to disk. That should be all fixable in runtime.Configuration.scala:74

Hope this helps.

megri commented 11 years ago

What if the platform encoding isn't UTF-8? Windows uses Cp1252 by default; creating a new project will make the files use Cp1252 so enforcing UTF-8 won't work. The reason it works as it is right now is that both nsc.Global and Eclipse default to the platform encoding. Running a "linux" worksheet under windows won't work as the file/platform charsets will mismatch.

Please correct me if I'm wrong :)

dragos commented 11 years ago

@megri, you are half-right. :)

The reason it works right now is that the presentation compiler respects the platform encoding. There's no "default", or at least, the default is wrong in 99% of the cases. For instance, on MacOs the default is MacRoman. The compiler picks it up from Eclipse ScalaProject.scala:491

Right now, there is a mismatch between the encoding used to read the file and the one to write it. That's one bug. The other (could be considered an enhancement), is to allow for per-file encodings. That won't happen very soon because of the Scala compiler, and I think it's probably not very common to have different encodings in the same project. My suggestion is to keep this ticket about the first issue, which seems way more common and annoying.

megri commented 11 years ago

@dragos there we go then!

dotta commented 11 years ago

Unfortunately, I need to re-open this ticket. see https://github.com/scala-ide/scala-worksheet/pull/127 for details.