philss / floki

Floki is a simple HTML parser that enables search for nodes using CSS selectors.
https://hex.pm/packages/floki
MIT License
2.07k stars 156 forks source link

Unexpected order of elements using &Floki.find/2 #98

Closed Eiji7 closed 7 years ago

Eiji7 commented 7 years ago

Example code:

html = "<table summary=\"License Detail\" cellspacing=\"0\" cellpadding=\"0\" border=\"0\" width=\"100%\">\n\t\t\t<colgroup>\n\t\t\t\t<col width=\"30%\">\n\t\t\t\t<col width=\"70%\">\n\t\t\t</colgroup>\n\t\t\t\n\t\t\t<thead>\n\t\t\t\t<tr class=\"listingHeader\">\n\t\t\t\t\t<th id=\"hdr1\" align=\"left\">License Number: 375</th>\n\t\t\t\t\t<th id=\"hdr2\" colspan=\"3\" align=\"right\"><i>Current Date:  03/01/2017 10:06 AM</i></th>\n\t\t\t\t</tr>\n\t\t\t</thead>\n\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t<tbody><tr>\n\t\t\t\t\t<th headers=\"hdr1\" scope=\"row\">\n\t\t\t\t\t\t<input name=\"selindex\" value=\"0\" type=\"hidden\">\n\t\t\t\t\t\t<span class=\"labelLeft\">Name:</span>\n\t\t\t\t\t</th>\n\t\t\t\t\t<td class=\"dataView\">\n\t\t\t\t\t\tAchuff, Jeanne Ann\n\t\t\t\t\t</td>\n\t\t\t\t</tr>\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t<tr>\n\t\t\t\t\t<th headers=\"hdr1\" scope=\"row\">\n\t\t\t\t\t\t<span class=\"labelLeft\">License Type:</span>\n\t\t\t\t\t</th>\n\t\t\t\t\t<td class=\"dataView\">\n\t\t\t\t\t\tNaturopathic Doctor\n\t\t\t\t\t\t\n\t\t\t\t\t</td>\n\t\t\t\t</tr>\n\t\t\t\t\n\t\t\t\t<tr>\n\t\t\t\t\t<th headers=\"hdr1\" scope=\"row\">\n\t\t\t\t\t\t<span class=\"labelLeft\">License Status:</span>\n\t\t\t\t\t</th>\n\t\t\t\t\t<td class=\"dataView\">\n\t\t\t\t\t\tLicense Renewed &amp; Current\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t<a class=\"toolTipLink\" href=\"#\" alt=\"Licensee meets requirements for the practice of medicine in California.\n\" onfocus=\"toolTipLink(this);\" onblur=\"toolTipLinkBlur(this);\"><span class=\"toolTip_big\">&nbsp;</span></a>\n\t\t\t\t\t\t\n\t\t\t\t\t</td>\n\t\t\t\t</tr>\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\n\n\n\n\n\n\t\t\n\t\t\t<tr>\n\t\t\t\t<th headers=\"hdr1\" scope=\"row\">\n\t\t\t\t\t<span class=\"labelLeft\">Secondary Status: </span>\n\t\t\t\t</th>\n\t\t\t\t<td class=\"dataCell\">\n\t\t\t\t\t<span class=\"dataView\"> NDF Qualified </span>\n\t\t\t\t\t\n\t\t\t\t</td>\n\t\t\t</tr>\n\t\t\t\n\t\t\n\n\n\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\t<tr>\n\t\t\t\t\t\t<th headers=\"hdr1\" scope=\"row\">\n\t\t\t\t\t\t\t<span class=\"labelLeft\">Expiration Date:</span>\n\t\t\t\t\t\t</th>\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t<td class=\"dataView\">\n\t\t\t\t\t\t\t\t06/30/2017\n\t\t\t\t\t\t\t</td>\n\t\t\t\t\t\t\n\t\t\t\t\t</tr>\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t<tr>\n\t\t\t\t\t<th headers=\"hdr1\" scope=\"row\">\n\t\t\t\t\t\t<span class=\"labelLeft\">Original Issuance Date:</span>\n\t\t\t\t\t</th>\n\t\t\t\t\t\n\t\t\t\t\t\t<td class=\"dataView\">\n\t\t\t\t\t\t\t09/28/2009\n\t\t\t\t\t\t</td>\n\t\t\t\t\t\n\t\t\t\t</tr>\n\t\t\t\t\n\t\t\t\t\n\t\t\t</tbody></table>"
Floki.find(html, "table[summary='License Detail'] td.dataView")

Expected behaviour:

&Floki.find/2 should keep order of elements

[{"td", [{"class", "dataView"}], ["\n\t\t\t\t\t\tAchuff, Jeanne Ann\n\t\t\t\t\t"]},
 {"td", [{"class", "dataView"}], ["\n\t\t\t\t\t\tNaturopathic Doctor\n\t\t\t\t\t\t\n\t\t\t\t\t"]},
 {"td", [{"class", "dataView"}],
   ["\n\t\t\t\t\t\tLicense Renewed & Current\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t",
    {"a",
     [{"class", "toolTipLink"}, {"href", "#"},
      {"alt", "Licensee meets requirements for the practice of medicine in California.\n"},
      {"onfocus", "toolTipLink(this);"}, {"onblur", "toolTipLinkBlur(this);"}],
     [{"span", [{"class", "toolTip_big"}], [" "]}]}]},
 {"td", [{"class", "dataView"}], ["\n\t\t\t\t\t\t\t\t06/30/2017\n\t\t\t\t\t\t\t"]},
 {"td", [{"class", "dataView"}], ["\n\t\t\t\t\t\t\t09/28/2009\n\t\t\t\t\t\t"]}]

Current behaviour:

I got reversed List of elements.

[{"td", [{"class", "dataView"}], ["\n\t\t\t\t\t\t\t09/28/2009\n\t\t\t\t\t\t"]},
 {"td", [{"class", "dataView"}], ["\n\t\t\t\t\t\t\t\t06/30/2017\n\t\t\t\t\t\t\t"]},
 {"td", [{"class", "dataView"}],
  ["\n\t\t\t\t\t\tLicense Renewed & Current\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t",
   {"a",
    [{"class", "toolTipLink"}, {"href", "#"},
     {"alt", "Licensee meets requirements for the practice of medicine in California.\n"},
     {"onfocus", "toolTipLink(this);"}, {"onblur", "toolTipLinkBlur(this);"}],
    [{"span", [{"class", "toolTip_big"}], [" "]}]}]},
 {"td", [{"class", "dataView"}], ["\n\t\t\t\t\t\tNaturopathic Doctor\n\t\t\t\t\t\t\n\t\t\t\t\t"]},
 {"td", [{"class", "dataView"}], ["\n\t\t\t\t\t\tAchuff, Jeanne Ann\n\t\t\t\t\t"]}]
philss commented 7 years ago

@Eiji7 Thank you for the report!

I think this is related with combinators. Seems that the order is not reversed when finding inside combinators. I'm investigating this.

philss commented 7 years ago

Closed by #99.

philss commented 7 years ago

@Eiji7 The fix was released in version 0.15.0. Thank you again! :)