mischov / meeseeks

An Elixir library for parsing and extracting data from HTML and XML with CSS or XPath selectors.
MIT License
316 stars 23 forks source link

Tuple tree parser accepting invalid input #74

Closed mischov closed 4 years ago

mischov commented 5 years ago

@pclewis recently reported the following problem which indicates that the tuple tree parser wrongly accepts atoms in element attributes.

{"a", [b: "c"], []} |> Meeseeks.parse(:tuple_tree) |> Meeseeks.html()
** (ArgumentError) argument error
    :erlang.iolist_to_binary([[["<", "a", [[" ", :b, "=\"", ["c"], "\""]], ">"], [], ["</", "a", ">"]]])
    (meeseeks) lib/meeseeks/document.ex:80: Meeseeks.Document.html/1

My guess is that this isn't the only case where what is accepted by the tuple tree parser isn't locked down enough.

mischov commented 4 years ago

Fixed in v0.15.0.