vsch / flexmark-java

CommonMark/Markdown Java parser with source level AST. CommonMark 0.28, emulation of: pegdown, kramdown, markdown.pl, MultiMarkdown. With HTML to MD, MD to PDF, MD to DOCX conversion modules.
BSD 2-Clause "Simplified" License
2.21k stars 260 forks source link

LinkRef can't be recognized when the reference name contains an underscore and there is a "*" in front #587

Open limingchina opened 11 months ago

limingchina commented 11 months ago

To Reproduce Create a java project adding flexmark dependency and run the following code.

package flexmark_linkref_bug;

import com.vladsch.flexmark.util.ast.Node;
import com.vladsch.flexmark.parser.Parser;
import com.vladsch.flexmark.test.util.AstCollectingVisitor;
import com.vladsch.flexmark.util.data.DataSet;

public class App {
    public static void main(String[] args) {
        Parser parser = Parser.builder(new DataSet()).build();
        Node document 
            = parser.parse("Note:a [INVALID_PARAMETER] error is generated.");
        System.out.println("====Good case if there is no '*'====");
        System.out.println(new AstCollectingVisitor().collectAndGetAstText(document));

        document 
            = parser.parse("Note*: [INVALID_PARAMETER] error is generated.");
        System.out.println("====Bad case: LinkRef is not recognized if there is a '*' in front====");
        System.out.println(new AstCollectingVisitor().collectAndGetAstText(document));

        document 
            = parser.parse("*Note*: [INVALID_PARAMETER] error is generated.");
        System.out.println("====Bad case: LinkRef is not recognized when using bold text which has a pair of '*'s====");
        System.out.println(new AstCollectingVisitor().collectAndGetAstText(document));

        document 
            = parser.parse("Note:* [INVALID] error is generated.");
        System.out.println("====Good case: LinkRef is recognized without underscore in the enum value====");
        System.out.println(new AstCollectingVisitor().collectAndGetAstText(document));

        document 
            = parser.parse("This is a test\n" + "\n" +"Note:* [Error.INVALID_PARAMETER] error is generated.");
        System.out.println("====Good case: LinkRef is recognized when there is sentence followed with an empty line====");
        System.out.println(new AstCollectingVisitor().collectAndGetAstText(document));

        document 
            = parser.parse("This is a tes\n" + "\n" +"Note:* [Error.INVALID_PARAMETER] error is generated.");
        System.out.println("====Bad case: LinkRef is not recognized when the first sentence is too short====");
        System.out.println(new AstCollectingVisitor().collectAndGetAstText(document));

        document 
            = parser.parse("This is a tes\n" + "\n" +"Note:* [INVALID_PARAMETER] error is generated.");
        System.out.println("====Good case: LinkRef is recognized after removing the 'Error.'====");
        System.out.println(new AstCollectingVisitor().collectAndGetAstText(document));
    }
}

Resulting Output :

====Good case if there is no '*'====
Document[0, 46]
  Paragraph[0, 46]
    Text[0, 7] chars:[0, 7, "Note:a "]
    LinkRef[7, 26] referenceOpen:[7, 8, "["] reference:[8, 25, "INVALID_PARAMETER"] referenceClose:[25, 26, "]"]
      Text[8, 25] chars:[8, 25, "INVAL … METER"]
    Text[26, 46] chars:[26, 46, " erro … ated."]

====Bad case: LinkRef is not recognized if there is a '*' in front====
Document[0, 46]
  Paragraph[0, 46]
    Text[0, 46] chars:[0, 46, "Note* … ated."]

====Bad case: LinkRef is not recognized when using bold text which has a pair of '*'s====
Document[0, 47]
  Paragraph[0, 47]
    Emphasis[0, 6] textOpen:[0, 1, "*"] text:[1, 5, "Note"] textClose:[5, 6, "*"]
      Text[1, 5] chars:[1, 5, "Note"]
    Text[6, 47] chars:[6, 47, ": [IN … ated."]

====Good case: LinkRef is recognized without underscore in the enum value====
Document[0, 36]
  Paragraph[0, 36]
    Text[0, 7] chars:[0, 7, "Note:* "]
    LinkRef[7, 16] referenceOpen:[7, 8, "["] reference:[8, 15, "INVALID"] referenceClose:[15, 16, "]"]
      Text[8, 15] chars:[8, 15, "INVALID"]
    Text[16, 36] chars:[16, 36, " erro … ated."]

====Good case: LinkRef is recognized when there is sentence followed with an empty line====
Document[0, 68]
  Paragraph[0, 15] isTrailingBlankLine
    Text[0, 14] chars:[0, 14, "This  …  test"]
  Paragraph[16, 68]
    Text[16, 23] chars:[16, 23, "Note:* "]
    LinkRef[23, 48] referenceOpen:[23, 24, "["] reference:[24, 47, "Error.INVALID_PARAMETER"] referenceClose:[47, 48, "]"]
      Text[24, 47] chars:[24, 47, "Error … METER"]
    Text[48, 68] chars:[48, 68, " erro … ated."]

====Bad case: LinkRef is not recognized when the first sentence is too short====
Document[0, 67]
  Paragraph[0, 14] isTrailingBlankLine
    Text[0, 13] chars:[0, 13, "This  … a tes"]
  Paragraph[15, 67]
    Text[15, 67] chars:[15, 67, "Note: … ated."]

====Good case: LinkRef is recognized after removing the 'Error.'====
Document[0, 61]
  Paragraph[0, 14] isTrailingBlankLine
    Text[0, 13] chars:[0, 13, "This  … a tes"]
  Paragraph[15, 61]
    Text[15, 22] chars:[15, 22, "Note:* "]
    LinkRef[22, 41] referenceOpen:[22, 23, "["] reference:[23, 40, "INVALID_PARAMETER"] referenceClose:[40, 41, "]"]
      Text[23, 40] chars:[23, 40, "INVAL … METER"]
    Text[41, 61] chars:[41, 61, " erro … ated."]

Additional context The bug was initially reported in the heremaps/gluecodium project: https://github.com/heremaps/gluecodium/issues/1542. After debugging, it's found that the issue is actually in the flexmark library.

limingchina commented 11 months ago

This diff seem to be working. However, I feel it might not be the correct fix. The problem is that the function isStraddling tries to find out a delimiter inside the brackets. However, it's unclear to me why if the delimiter is not matching, the LinkRef detection would be skipped later. It's actually quite possible that the link section contains some underscores.

diff --git a/flexmark/src/main/java/com/vladsch/flexmark/parser/core/delimiter/Bracket.java b/flexmark/src/main/java/com/vladsch/flexmark/parser/core/delimiter/Bracket.java
index 89e5050c4..e172d8cb6 100644
--- a/flexmark/src/main/java/com/vladsch/flexmark/parser/core/delimiter/Bracket.java
+++ b/flexmark/src/main/java/com/vladsch/flexmark/parser/core/delimiter/Bracket.java
@@ -92,7 +92,15 @@ public class Bracket {
         // first see if we have any closers in our span
         int startOffset = nodeChars.getStartOffset();
         int endOffset = nodeChars.getEndOffset();
-        Delimiter inner = previousDelimiter == null ? null : previousDelimiter.getNext();
+
+        Delimiter inner = null;
+        if (previousDelimiter != null) {
+            inner = previousDelimiter.getNext();
+            if (inner != null && inner.getDelimiterChar() != previousDelimiter.getDelimiterChar()) {
+                // If the delimiter chars are not matching then we are not straddling
+                inner = null;
+            }
+        }
         while (inner != null) {
             int innerOffset = inner.getEndIndex();
             if (innerOffset >= endOffset) break;