serpapi / nokolexbor

High-performance HTML5 parser for Ruby based on Lexbor, with support for both CSS selectors and XPath.
218 stars 4 forks source link

Fix the css selector #15

Open Krugloff opened 3 weeks ago

Krugloff commented 3 weeks ago

The ~ selector is not working as expected.

I'm trying to extract only the blocks that appear before the .more-news element. This works in the browser but doesn't behave as expected in my code.

Environment

Additional context

test_string = <<-STR
<div>
<div class="newscard position1"></div>
<div class="newscard position2"></div>
<div class="more-news"></div>
<div class="newscard position3"></div>
<div class="newscard position4"></div>
<div>
STR

require 'nokolexbor'

doc = Nokolexbor::HTML(test_string)
doc.css(".newscard:not(.more-news ~ .newscard)").count # => 4 (should be 2)

image

image
lexborisov commented 3 weeks ago

@Krugloff

I think we should just update the lexbor sources in nokolexbor.

In lexbor:

<div><div class="newscard position1"></div><div class="newscard position2"></div><div class="more-news"></div><div class="newscard position3"></div><div class="newscard position4"></div><div></div></div>

Selectors: .newscard:not(.more-news ~ .newscard)

1) <div class="newscard position1">
2) <div class="newscard position2">
Count: 2