tree-sitter / tree-sitter-css

CSS grammar for Tree-sitter
MIT License
79 stars 33 forks source link

bug: Some selectors in `:has` are treated as plain values #45

Open savetheclocktower opened 5 months ago

savetheclocktower commented 5 months ago

Did you check existing issues?

Tree-Sitter CLI Version, if relevant (output of tree-sitter --version)

No response

Describe the bug

Only certain kinds of selectors fail to be parsed within a :hasclass_selector and id_selector when they have tag names.

Steps To Reproduce/Bad Parse Tree

This parses correctly:

div.myclass:has(li) {}
(stylesheet [0, 0] - [1, 0]
  (rule_set [0, 0] - [0, 22]
    (selectors [0, 0] - [0, 19]
      (pseudo_class_selector [0, 0] - [0, 19]
        (class_selector [0, 0] - [0, 11]
          (tag_name [0, 0] - [0, 3])
          (class_name [0, 4] - [0, 11]))
        (class_name [0, 12] - [0, 15])
        (arguments [0, 15] - [0, 19]
          (tag_name [0, 16] - [0, 18]))))
    (block [0, 20] - [0, 22])))

This does not:

div.myclass:has(li.foo) {}
(stylesheet [0, 0] - [0, 30]
  (rule_set [0, 0] - [0, 30]
    (selectors [0, 0] - [0, 27]
      (pseudo_class_selector [0, 0] - [0, 27]
        (class_selector [0, 0] - [0, 11]
          (tag_name [0, 0] - [0, 3])
          (class_name [0, 4] - [0, 11]))
        (class_name [0, 12] - [0, 15])
        (arguments [0, 15] - [0, 27]
          (plain_value [0, 16] - [0, 26]))))
    (block [0, 28] - [0, 30])))

Here are some other examples that parse exactly as expected:

div.myclass:has(#foo) {}
div.myclass:has(.bar) {}
div.myclass:has(foo[bar]) {}
div.myclass:has(li ~ p) {}
div.myclass:has(li p) {}
div.myclass:has(p li.foo) {} /* (weirdly enough) */

And here are some which are interpreted as plain_value:

div.myclass:has(li#foo) {}
div.myclass:has(li.foo) {}
div.myclass:has(li.foo p) {}
div.myclass:has(p.bar li.foo) {}

Expected Behavior/Parse Tree

In each of these cases, the plain_value should instead be a selectors node. :has can accept selectors of arbitrary complexity, much like :not.

Repro

No response

savetheclocktower commented 5 months ago

So I think I understand the problem:

So this is a lexical precedence issue. I can think of a few solutions:

But the simplest thing I can think of — use prec to encourage the parser to favor _selector over plain_value — is the one I just can't get working.

I could demote plain_value to a lower precedence, and this solves my problem…

    plain_value: _ => token(prec(-1, seq(
      repeat(choice(
        /[-_]/,
        /\/[^\*\s,;!{}()\[\]]/, // Slash not followed by a '*' (which would be a comment)
      )),
      /[a-zA-Z]/,
      repeat(choice(
        /[^/\s,;!{}()\[\]]/, // Not a slash, not a delimiter character
        /\/[^\*\s,;!{}()\[\]]/, // Slash not followed by a '*' (which would be a comment)
      )),
    ))),

…but breaks three other tests. I'd much rather boost the precedence of _selectors, but I can't seem to get that to have any effect.

I think I'm pretty close on this one and just need a nudge to find the right answer.