stsewd / tree-sitter-comment

Tree-sitter grammar for comment tags like TODO, FIXME(user).
https://stsewd.dev/tree-sitter-comment/
MIT License
134 stars 9 forks source link

Incorrect parsing for URLs with parenthesis in them #35

Open alexaandru opened 8 months ago

alexaandru commented 8 months ago

Given a file that includes:

// https://en.wikipedia.org/wiki/ANSI_escape_code#SGR_(Select_Graphic_Rendition)_parameters

I would expect https://en.wikipedia.org/wiki/ANSI_escape_code#SGR_(Select_Graphic_Rendition)_parameters to be parsed as URL. However, when I inspect the parse tree, I see that it says:

(comment ; [28, 0] - [28, 91]
  (source ; [28, 0] - [28, 91]
    (uri))) ; [28, 3] - [28, 79]

it considers it as URL up to, but not including, the closing parenthesis: https://en.wikipedia.org/wiki/ANSI_escape_code#SGR_(Select_Graphic_Rendition.

stsewd commented 8 months ago

Hi, this is mostly because ) is considered a stop words like (URL)wods.. or (URL). words... But I'll see if the rules can be relaxed.

alexaandru commented 8 months ago

As per the URL RFC spec:

Thus, only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL.

that URL is perfectly legal. Thank you for looking into this :-)