lexbor / lexbor

Lexbor is development of an open source HTML Renderer library. https://lexbor.com
Apache License 2.0
1.61k stars 106 forks source link

CSS reverse match broken #182

Closed phoerious closed 1 year ago

phoerious commented 1 year ago

Reverse CSS matching doesn't work beyond the start element itself. There is one unit test that uses lxb_selectors_find(), but none for lxb_selectors_find_reverse(), so perhaps this is a regression.

Here are a few reproduction examples as diffs of examples/lexbor/selectors/easy_way.c:

Example 1:

diff --git a/examples/lexbor/selectors/easy_way.c b/examples/lexbor/selectors/easy_way.c
index 01e6041..94164d4 100644
--- a/examples/lexbor/selectors/easy_way.c
+++ b/examples/lexbor/selectors/easy_way.c
@@ -49,7 +49,7 @@ main(int argc, const char *argv[])

     /* CSS Data. */

-    static const lxb_char_t slctrs[] = ".x, div:has(p[id=Y i])";
+    static const lxb_char_t slctrs[] = "body";

     /* Create HTML Document. */

@@ -95,7 +95,7 @@ main(int argc, const char *argv[])

     printf("Found:\n");

-    status = lxb_selectors_find(selectors, body, list, find_callback, &count);
+    status = lxb_selectors_find_reverse(selectors, body, list, find_callback, &count);
     if (status != LXB_STATUS_OK) {
         return EXIT_FAILURE;
     }

Output:

Selectors: body
Found:
1) <body>

WORKS! (self-match)

Example 2:

diff --git a/examples/lexbor/selectors/easy_way.c b/examples/lexbor/selectors/easy_way.c
index 01e6041..263298b 100644
--- a/examples/lexbor/selectors/easy_way.c
+++ b/examples/lexbor/selectors/easy_way.c
@@ -49,7 +49,7 @@ main(int argc, const char *argv[])

     /* CSS Data. */

-    static const lxb_char_t slctrs[] = ".x, div:has(p[id=Y i])";
+    static const lxb_char_t slctrs[] = "html";

     /* Create HTML Document. */

@@ -95,7 +95,7 @@ main(int argc, const char *argv[])

     printf("Found:\n");

-    status = lxb_selectors_find(selectors, body, list, find_callback, &count);
+    status = lxb_selectors_find_reverse(selectors, body, list, find_callback, &count);
     if (status != LXB_STATUS_OK) {
         return EXIT_FAILURE;
     }

Expected output:

Selectors: html
Found:
1) <html>

Actual output:

Selectors: html
Found:

Example 3:

diff --git a/examples/lexbor/selectors/easy_way.c b/examples/lexbor/selectors/easy_way.c
index 01e6041..f4096d2 100644
--- a/examples/lexbor/selectors/easy_way.c
+++ b/examples/lexbor/selectors/easy_way.c
@@ -49,7 +49,7 @@ main(int argc, const char *argv[])

     /* CSS Data. */

-    static const lxb_char_t slctrs[] = ".x, div:has(p[id=Y i])";
+    static const lxb_char_t slctrs[] = "div";

     /* Create HTML Document. */

@@ -95,7 +95,7 @@ main(int argc, const char *argv[])

     printf("Found:\n");

-    status = lxb_selectors_find(selectors, body, list, find_callback, &count);
+    status = lxb_selectors_find_reverse(selectors, body->first_child->first_child, list, find_callback, &count);
     if (status != LXB_STATUS_OK) {
         return EXIT_FAILURE;
     }

Expected output:

Selectors: div
Found:
1) <div>

Actual output:

Selectors: div
Found:
lexborisov commented 1 year ago

@phoerious

I have a feeling that you don't understand how the function lxb_selectors_find_reverse() works. It (the function) checks if the specified selectors match or not by node (html element).

HTML

<body><div><span>blah</span></div></body>

We take a node/element <span> and check which selectors will fit it.

Selectors:

div span -- match
body span -- match
span div -- not match
phoerious commented 1 year ago

It's quite possible that misinterpreted how the function is supposed to work, since there is neither documentation nor an example and I didn't have to time to fully reverse-engineer it.

So from what you are saying, this seems to be more like matches(), but from the name I would expect it to behave like closest().

phoerious commented 1 year ago

All right, I have now fixed my closest() implementation and migrated my matches() implementation to also use Lexbor's reverse match:

https://github.com/chatnoir-eu/chatnoir-resiliparse/blob/87ebf8197e5705b80d243cb4eff585381b234cd3/resiliparse/lib/parse/html/dom/node.rs#L692

This works well, thanks for the help. Some API documentation would still be appreciated, though. ;-)