philss / floki

Floki is a simple HTML parser that enables search for nodes using CSS selectors.
https://hex.pm/packages/floki
MIT License
2.05k stars 155 forks source link

Plus in find not work as expected #79

Closed Eiji7 closed 7 years ago

Eiji7 commented 7 years ago

For this code:

<span>1</span>&nbsp;<a>2</a>

I run:

Floki.find(html, "span + a")

Expected results:

[{"a", [], ["2"]}]

Actual results:

[]

Note: With this html:

<span>1</span> <a>2</a>

and this:

<span>1</span><a>2</a>

find method works as expected.

So problem is that Floki interprets &nbsp; (text node) as normal node. So next element (for Floki) after span element is text node instead of a element.

Note2: As a workaround I tried:

Floki.find(html, "span + * + a") # any element between span and link

but I got:

** (MatchError) no match of right hand side value: " "
                lib/floki/finder.ex:137: Floki.Finder.traverse_sibling/3
                lib/floki/finder.ex:69: Floki.Finder.traverse/4
                lib/floki/finder.ex:73: Floki.Finder.traverse/4
                lib/floki/finder.ex:47: Floki.Finder.find_selectors/2
philss commented 7 years ago

@Eiji7 Good catch! I think this problem extends not only for the sibling selector (">"). It should have the same problem for general sibling ("~") and the "nth-child" pseudo class selector.

I'm looking into fix this soon. Thank you for the report!