philss / floki

Floki is a simple HTML parser that enables search for nodes using CSS selectors.
https://hex.pm/packages/floki
MIT License
2.07k stars 156 forks source link

Unexpected order using `find` with comma separated selectors in v0.35.3 #539

Closed danschultzer closed 8 months ago

danschultzer commented 8 months ago

Upgrading from 0.35.2 to 0.35.3, find no longer conforms with the order of the document elements for comma separated selectors.

This is what 0.35.2 produces:

iex(1)> Floki.find("<div class=\"first\"></div><p class=\"second\"></p><div class=\"third\"></div>", "p,div")

[
  {"div", [{"class", "first"}], []},
  {"p", [{"class", "second"}], []},
  {"div", [{"class", "third"}], []}
]

This is what 0.35.3 produces:

iex(1)> Floki.find("<div class=\"first\"></div><p class=\"second\"></p><div class=\"third\"></div>", "p,div")

[
  {"p", [{"class", "second"}], []},
  {"div", [{"class", "first"}], []},
  {"div", [{"class", "third"}], []}
]
philss commented 8 months ago

Good call! I think the problem was introduced in #518 (d0be5510707212f9b501303ceee54851c5870f96 according to bisect).

@ypconstante would you mind to take a look? I wrote a small test case:

defmodule Floki.FindWithBug do
  use ExUnit.Case, async: true

  @tag only_parser: Floki.HTMLParser.Mochiweb
  test "correct order" do
    assert Floki.find(
             "<div class=\"first\"></div><p class=\"second\"></p><div class=\"third\"></div>",
             "p,div"
           ) ==
             [
               {"div", [{"class", "first"}], []},
               {"p", [{"class", "second"}], []},
               {"div", [{"class", "third"}], []}
             ]
  end
end
ypconstante commented 8 months ago

Yeah, the stack change is causing this. I did some checks and there are more cases in which we don't return the nodes in the right order, unrelated with this change

iex> Floki.find(
  Floki.parse_fragment!("""
  <div id="1">
    <div id="2">
      <p id="3"></p>
    </div>
    <p id="4"></p>
  </div>
  """),
  "div p"
)

[{"p", [{"id", "4"}], []}, {"p", [{"id", "3"}], []}]