Different formatting of block line comments with openjdk 23+37-2369

dweiss commented 3 months ago

Applies to: 1.23.0

I came across this oddity - this file from Apache Lucene: SandboxFacetsExample.txt

doesn't need any changes with jdk17-jdk22:

> java -version
openjdk version "22.0.1" 2024-04-16
OpenJDK Runtime Environment Temurin-22.0.1+8 (build 22.0.1+8)
OpenJDK 64-Bit Server VM Temurin-22.0.1+8 (build 22.0.1+8, mixed mode, sharing)
> java -jar google-java-format-1.23.0-all-deps.jar -n SandboxFacetsExample.java

but will result in reformatting under jdk 23 (ea, 23+37-2369):

> java -version
openjdk version "23" 2024-09-17
OpenJDK Runtime Environment (build 23+37-2369)
OpenJDK 64-Bit Server VM (build 23+37-2369, mixed mode, sharing)

>java -jar google-java-format-1.23.0-all-deps.jar -n SandboxFacetsExample.java
SandboxFacetsExample.java

Fully reproducible on Windows and Linux. The diff is:

--- a/SandboxFacetsExample.java
+++ b/SandboxFacetsExample.java_
@@ -149,7 +149,8 @@ public class SandboxFacetsExample {
         new FacetFieldCollectorManager<>(defaultTaxoCutter, defaultRecorder);

     //// (2.1) if we need to collect data using multiple different collectors, e.g. taxonomy and
-    ////       ranges, or even two taxonomy facets that use different Category List Field, we can
+    ////       ranges, or even two taxonomy facets that use different Category List Field, we
+    // can
     ////       use MultiCollectorManager, e.g.:
     // TODO: add a demo for it.
     // TaxonomyFacetsCutter publishDateCutter = new

dweiss commented 3 months ago

I toyed a bit with debugging this and there's a difference in how these line comments appear to JavaCommentsHelper.wrapLineComments. With JDK21, it receives that comment line-by-line. With JDK23, it receives a concatenation of all lines, with lines 2 and on having a whitespace prefix:

java21:

////       ranges, or even two taxonomy facets that use different Category List Field, we can

java23:

//// (2.1) if we need to collect data using multiple different collectors, e.g. taxonomy and
    ////       ranges, or even two taxonomy facets that use different Category List Field, we can
    ////       use MultiCollectorManager, e.g.:

this triggers the difference because the split condition now sees different line length for those subsequent lines.

while (line.length() + column0 > Formatter.MAX_LINE_LENGTH) {
...

Hope this helps somehow.

dweiss commented 3 months ago

For what it's worth, core tests pass when I do this:

diff --git a/core/src/main/java/com/google/googlejavaformat/java/JavaCommentsHelper.java b/core/src/main/java/com/google/googlejavaformat/java/JavaCommentsHelper.java
index d34ecc4..d54b231 100644
--- a/core/src/main/java/com/google/googlejavaformat/java/JavaCommentsHelper.java
+++ b/core/src/main/java/com/google/googlejavaformat/java/JavaCommentsHelper.java
@@ -49,7 +49,11 @@ public final class JavaCommentsHelper implements CommentsHelper {
     List<String> lines = new ArrayList<>();
     Iterator<String> it = Newlines.lineIterator(text);
     while (it.hasNext()) {
-      lines.add(CharMatcher.whitespace().trimTrailingFrom(it.next()));
+      if (tok.isSlashSlashComment()) {
+        lines.add(CharMatcher.whitespace().trimFrom(it.next()));
+      } else {
+        lines.add(CharMatcher.whitespace().trimTrailingFrom(it.next()));
+      }
     }
     if (tok.isSlashSlashComment()) {
       return indentLineComments(lines, column0);

but I'm not sure whether this is the right fix. Perhaps it'd be good to find out why the token text is different with jdk 23 (as it's the primary cause of the problem).

cushon commented 2 months ago

Thanks for the bug and the investigation! Presumably the difference in JDK 23 is due to the new support for markdown doc comments.

dweiss commented 2 months ago

I think you're right - that's a spot-on observation. Interestingly, this comment appears inline with the code, not as a javadoc of anything in particular [1]. Could be a regression in the comment parser worth reporting to openjdk.

[1] https://github.com/apache/lucene/blob/304d4e7855deb39b4650d954d027ce8697873056/lucene/demo/src/java/org/apache/lucene/demo/facet/SandboxFacetsExample.java#L151-L153

cushon commented 2 months ago

I think it's probably a deliberate change on the javac side, to be able to process the entire /// comment as a single token instead of multiple line comments. Your fix of removing the leading and trailing whitespace seems OK to me, the formatter will add any necessary leading whitespace back as part of indentLineComments, it seems reasonable to remove the leading whitespace to fix the line length computation.

Are you interested in sending a PR?

dweiss commented 2 months ago

Thank you for adding the regression test. I've created a PR with the basic workaround I suggested, hope it helps.

google / google-java-format

Different formatting of block line comments with openjdk 23+37-2369 #1153