tree-sitter / tree-sitter-html

HTML grammar for Tree-sitter
MIT License
136 stars 72 forks source link

Closing tag name affects tree #74

Closed seblj closed 9 months ago

seblj commented 10 months ago

I am trying to parse the tree at an "invalid" state where one tag has a different name than the other tag, and I noticed something that I believe could be a bug.

This code:

<div>
  <foo></bar>
</div>

Produces this tree:

(fragment
  (element
    (start_tag
      (tag_name))
    (element
      (start_tag
        (tag_name)))
        (erroneous_end_tag
          (erroneous_end_tag_name))
    (end_tag
      (tag_name))))

However, I would expect it to produce this:

(fragment
  (element
    (start_tag
      (tag_name))
    (element
      (start_tag
        (tag_name))
      (erroneous_end_tag
        (erroneous_end_tag_name)))
    (end_tag
      (tag_name))))

Notice that erronous_end_tag and erronous_end_tag_name is not a child of element. However, if the tags have the same name, then they are a child of element (only with end_tag and end_tag_name. It is perfectly fine to mark it as erronous_end_tag-nodes, but I believe they should still be a part of the element node?

I tried to look at this myself, but I am not too familiar with tree-sitter parsers, so I couldn't find out how to solve this yet. Would appreciate any help if someone does not want to fix this themselves :)

amaanq commented 9 months ago

we can't tell if the end tag is intended to be the end of a tag that exists above it, or is a mistake/typo for the opening tag immediately before it - this doesn't have a practical solution and can't be solved, this is difficult even when you're being context-aware (which tree-sitter isn't)