sormuras / junit-platform-maven-plugin

Maven Plugin launching the JUnit Platform
Apache License 2.0
61 stars 15 forks source link

MalformedInputException on Locale C.UTF-8 on JDK 11 #115

Closed dbwiddis closed 10 months ago

dbwiddis commented 11 months ago

Coming from https://github.com/oshi/oshi/pull/2525#issuecomment-1838992663

  1. My library prints output to stdout containing the degree sign, UTF-8 0xC2B0.
  2. This output is printed as part of our CI testing. One such CI is on a Solaris 11.4 x86 VM. Important to this discussion, the locale is set to LANG=C.UTF-8.
  3. While the CI test passes, the output catches the same error we saw in #95 on Windows

Full log: https://github.com/oshi/oshi/actions/runs/7089320523/job/19293798675 Excerpt:

  Caused by: java.lang.AssertionError: Unexpected exception caught!
    at de.sormuras.junit.platform.maven.plugin.JUnitPlatformMojo.execute(JUnitPlatformMojo.java:433)
<snip>
  Caused by: java.io.UncheckedIOException: java.nio.charset.MalformedInputException: Input length = 1
    at java.base/java.nio.file.FileChannelLinesSpliterator.readLine(FileChannelLinesSpliterator.java:176)
    at java.base/java.nio.file.FileChannelLinesSpliterator.forEachRemaining(FileChannelLinesSpliterator.java:116)
    at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:658)
    at de.sormuras.junit.platform.maven.plugin.JavaExecutor.evaluate(JavaExecutor.java:147)
<snip>
  Caused by: java.nio.charset.MalformedInputException: Input length = 1
    at java.base/java.nio.charset.CoderResult.throwException(CoderResult.java:274)
    at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
    at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
    at java.base/java.io.BufferedReader.fill(BufferedReader.java:161)
    at java.base/java.io.BufferedReader.readLine(BufferedReader.java:326)
    at java.base/java.io.BufferedReader.readLine(BufferedReader.java:392)
    at java.base/java.nio.file.FileChannelLinesSpliterator.readLine(FileChannelLinesSpliterator.java:174)

The code involved (exception thrown in Files.lines()

if (captureIO) {
  String encoding = System.getProperty("native.encoding"); // Populated on Java 18 and later
  Charset charset = encoding != null ? Charset.forName(encoding) : Charset.defaultCharset();
  try (Stream<String> stdoutput = Files.lines(outputPath, charset);
      Stream<String> erroutput = Files.lines(errorPath, charset)) {

Since this is pre-JDK 18, the fix for #95 using the "native.encoding" doesn't apply, however, the symptom is the same. The charset being used doesn't understand one of the characters being processed.

This Stack Overflow answer suggests trying multiple charsets. Another answer claims

ISO-8859-1 is an all-inclusive charset, in the sense that it's guaranteed not to throw MalformedInputException. So it's good for debugging

Not sure if this is something you want to handle here, but at a minimum I'd request that you at least catch the exception and provide a better error message indicating that the default charset (print it out) can't handle a line of output.

dbwiddis commented 11 months ago

FYI, I removed the offending character in my own project in the interest of portability, however, I still would like to handle this edge case here.

Happy to submit a PR. WDYT about adding a <configuration> option to specify an output charset? It would default to the JDK's default charset but would permit users to override that in the above code block, for example to ISO-8859-1.

sormuras commented 11 months ago

Happy to submit a PR. WDYT about adding a option to specify an output charset? It would default to the JDK's default charset but would permit users to override that in the above code block, for example to ISO-8859-1.

Sure, sounds like a good fallback solution for such cases.