tree-sitter / tree-sitter-html

HTML grammar for Tree-sitter
MIT License
136 stars 72 forks source link

"</script>" string literal in javascript breaks parser? #86

Closed michaelfortunato closed 8 months ago

michaelfortunato commented 8 months ago
image

I was looking at scan_raw_text and it seems like it does not account for string literals.

<html>
<script>
const a = "</script>"
const b = ""
</script>
</html>
Leads to this syntax tree: 
[fragment](https://tree-sitter.github.io/tree-sitter/playground#) [0, 0] - [6, 0]
  [element](https://tree-sitter.github.io/tree-sitter/playground#) [0, 0] - [5, 7]
    [start_tag](https://tree-sitter.github.io/tree-sitter/playground#) [0, 0] - [0, 6]
      [tag_name](https://tree-sitter.github.io/tree-sitter/playground#) [0, 1] - [0, 5]
    [script_element](https://tree-sitter.github.io/tree-sitter/playground#) [1, 0] - [2, 20]
      [start_tag](https://tree-sitter.github.io/tree-sitter/playground#) [1, 0] - [1, 8]
        [tag_name](https://tree-sitter.github.io/tree-sitter/playground#) [1, 1] - [1, 7]
      [raw_text](https://tree-sitter.github.io/tree-sitter/playground#) [1, 8] - [2, 11]
      [end_tag](https://tree-sitter.github.io/tree-sitter/playground#) [2, 11] - [2, 20]
        [tag_name](https://tree-sitter.github.io/tree-sitter/playground#) [2, 13] - [2, 19]
    [text](https://tree-sitter.github.io/tree-sitter/playground#) [2, 20] - [3, 12]
    [erroneous_end_tag](https://tree-sitter.github.io/tree-sitter/playground#) [4, 0] - [4, 9]
      [erroneous_end_tag_name](https://tree-sitter.github.io/tree-sitter/playground#) [4, 2] - [4, 8]
    [end_tag](https://tree-sitter.github.io/tree-sitter/playground#) [5, 0] - [5, 7]
      [tag_name](https://tree-sitter.github.io/tree-sitter/playground#) [5, 2] - [5, 6]

In fact we can see the highlighter break here.

<html>
<script>
const a = "</script>"
const b = ""
</script>
</html>

Would a PR, where " and ` are checked and balanced in scan_raw_text be a good fix this issue? This would only apply when scan_raw_text is expecting a "" tag.

michaelfortunato commented 8 months ago

I think the following rules might be able to handle the raw_text rule in the case of Githubissues.

  • Githubissues is a development platform for aggregating issues.