playframework / playframework

The Community Maintained High Velocity Web Framework For Java and Scala.
http://www.playframework.com
Apache License 2.0
12.55k stars 4.1k forks source link

java.nio.charset.MalformedInputException: Input length = 1 #9022

Closed zoyaforever closed 5 years ago

zoyaforever commented 5 years ago

Play Version (2.7.0)

API (Java)

Operating System (Windows 10)

JDK (Oracle 1.8.0_202)

java version "1.8.0_202" Java(TM) SE Runtime Environment (build 1.8.0_202-b08) Java HotSpot(TM) 64-Bit Server VM (build 25.202-b08, mixed mode)

Library Dependencies

Expected Behavior

  1. My codes compile successfully in 2.6.21, when I changed to 2.7.0, it should be OK, but it failed.

Actual Behavior

I type command 'compile', this is the ouput: (I have already run the command 'clean' before 'compile')

It seems that it complains about routes file, but I don't change it.

image1 image2 image3

Routes file:

routes.zip

octonato commented 5 years ago

Hi @zoyaforever, could please post here your routes file or a reproducer that we can use to investigate it further.

I suspect that there is a regression on the routes compiler that only comes to the surface in some specific case which may explain why nobody detected it before and this is hitting you now. The clue is in your routes file. :-)

zoyaforever commented 5 years ago

@renatocaval Thanks for your help.

This is routes file: routes

octonato commented 5 years ago

Thanks @zoyaforever.

The file is quite large and I don't see anything wrong if it at first sight.

May I ask you do run the following:

  1. remove all entries from the file
  2. compile it
  3. add a small section of the original file
  4. repeate 2 and 3 until you find an offending entry.
zoyaforever commented 5 years ago

@renatocaval

Ok thanks, I will do it and tell you the result.

mkurz commented 5 years ago

This is definitely a bug and needs to be fixed in 2.7.1.

I am pretty sure this is caused by the removal of commons-io in #8443.

Have a look at what changed in RoutesCompiler.scala and in RoutesFileParser.scala below here: https://github.com/playframework/playframework/pull/8443/files#diff-7e7d903be0be4ca55af014f9a33b7295 Line 43 (Files.readAllLines(...)) is exactly the line that fails according to this bug report and it's the line which changed when removing commons-io... This can't be a coincidence :wink:

I am pretty sure FileUtils.readFileToString and FileUtils.writeStringToFile from commons-io we used in 2.6.x did something to prevent this error: Java's Files.readAllLines(...) uses a BufferedReader in the background which leads to the error, commons-io (I had a quick look in the source) however doesn't do that, it somehow just copies the bytes from the filessystem via an InputStream to a commons-io specific Writer and from this writer it creates a string.

After a bit of research I guess the problem is that the file isn't recognized with the correct encoding we would like to have it when using Files.readAllLines (which equals using BufferedReader)... For example see https://stackoverflow.com/questions/26268132/all-inclusive-charset-to-avoid-java-nio-charset-malformedinputexception-input I like the second answer from that stackoverflow question and I think this is what we should do as well: Instead of using Files.readAllLines we should use:

new BufferedReader(new InputStreamReader(new FileInputStream("a.txt"),"utf-8"));

and create the string ourselve. Another solution would be to iterate over various encodings if the exeption is thrown until we find the encoding the file is recognized with.

mkurz commented 5 years ago

One more thing: On my Ubuntu machine I tried to find out the encoding of the routes file @zoyaforever provided:

$ file --mime routes
routes: text/plain; charset=us-ascii

It's us-ascii because all the chars in the file just fit into this encoding. But I guess we are using UTF-8 in Java to read the file...

However, as soon as I add a UTF-8 character to the file e.g. the german umlaut ä it's recognized as UTF-8 file:

$ file --mime routes
routes: text/plain; charset=utf-8

@zoyaforever Can you please try to add following line on top of your routes file and check if it works now (it's just a commented umlaut)?

# ä
zoyaforever commented 5 years ago

@renatocaval @mkurz

mkurz commented 5 years ago

I still think the problem is what I described above. Probably you have some default encoding on your Windows 10 machine or the Java installation that causes the problem.

zoyaforever commented 5 years ago

hi @mkurz

I write a small test code in another java project:

Files.readAllLines(new File("routes").toPath(), Charset.forName("UTF-16"))

it produces the same error as in Play 2.7.0:

java.nio.charset.MalformedInputException: Input length = 1 at java.nio.charset.CoderResult.throwException(CoderResult.java:281) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) at java.io.InputStreamReader.read(InputStreamReader.java:184) at java.io.BufferedReader.fill(BufferedReader.java:161) at java.io.BufferedReader.readLine(BufferedReader.java:324) at java.io.BufferedReader.readLine(BufferedReader.java:389) at java.nio.file.Files.readAllLines(Files.java:3205) at sample.Main.main(Main.java:55) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498)

So I also think the problem is caused by the encoding.

zoyaforever commented 5 years ago

By the way, I have already added the following section in build.sbt

javacOptions ++= Seq( "-encoding", "UTF-8" )

octonato commented 5 years ago

Ok, let's fix it after the unnesting PR #9023 to avoid one more PR to migrate.

@zoyaforever have you tried to force your file to be UTF-8 in Windows? That might be an easy workaround for now.

And thank @mkurz for the archaeology work! ;-)

mkurz commented 5 years ago

@zoyaforever One more idea for a workaround: Let's try to set the default (file) encoding for Java to UTF-8 into an environment variable. You are using Windows 10 arent you? Open the command prompt:

C:\>set JAVA_TOOL_OPTIONS="-Dfile.encoding=UTF-8"
# Now compile your project:
C:\>sbt compile

Let me know if that worked for you.

See https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/envvars002.html

zoyaforever commented 5 years ago

Hi @mkurz ,thanks for your help

If I run it as you mentioned in Windows CMD, it compiles successfully. (I add an extra '=' between JAVA_TOOL_OPTIONS and "-Dfile.encoding=UTF-8") `

set JAVA_TOOL_OPTIONS="-Dfile.encoding=UTF-8"
sbt clean
sbt compile

`

If run it in Windows PowerShell, It fails.

But if I set the environment variable in here (desktop -> right click the Computer icon -> Properties -> Advanced system settings), It works in both CMD and PowerShell.

mkurz commented 5 years ago

@zoyaforever Great! Let's see if we can come up with a fix until 2.7.1 so you can get rid of the env variable.

mkurz commented 5 years ago

@zoyaforever Can you please tell me what java.nio.charset.Charset.defaultCharset() returns on your machine? Just use System.out.println(java.nio.charset.Charset.defaultCharset()); somewhere in your code to print the value. Thanks!

mkurz commented 5 years ago

But please remove the environment variable before you do this! Thanks!

zoyaforever commented 5 years ago

hi @mkurz After I remove the environment variable, it prints: GBK "GBK" is the default charset in China.

mkurz commented 5 years ago

Here is an idea how to this: #9052

mkurz commented 5 years ago

@zoyaforever Play 2.7.1 is available, containing a fix for this issue: https://blog.playframework.com/play-2-7-1-released/ Can you let us know if the problem is solved for you now? Please don't forget to remove the environment variables first. Thanks!

zoyaforever commented 5 years ago

@mkurz The problem is resolved. I have already removed the environment variables. Thanks for your work.

giannoug commented 4 years ago

This is still relevant in 2.7.1. Running set JAVA_TOOL_OPTIONS="-Dfile.encoding=UTF-8" fixes the issue.

mkurz commented 4 years ago

@giannoug Can you please remove your target folder and then try again (without ´JAVA_TOOL_OPTIONS` set)? I am pretty sure this was fixed with Play 2.7.1

giannoug commented 4 years ago

@mkurz Sorry for not providing more information. I can't really reproduce this. It seems to happen randomly. I tried reproducing it yesterday and couldn't make it, but it happened twice later in the day. Its weird because I'm using the same cmd window. I'll try manually deleting my target folder and see if it happens again.

For the record I'm using a combination of the Scala plugin for IntelliJ and the sbt console (switching between them). I only saw this exception in the sbt console when running sbt compile.