When trying to add a child to a node with Floki.traverse_and_update/2, previous children of the HTML are being added recursively, and then closed all at once in the end #357
Consider the basic HTML below, where I want to find each <h1> header, add anchors and generate a table of contents, by adding an id attribute and prepend a child <a href/> anchor into the <h1>, for each <h1>:
I'm using Floki.traverse_and_update/2 and then matching {"h1", attrs, children}, where I extract the inner <h1> text from children, generate the id and add to attrs and add a <a/> into children.
Result
The problem is that the previous h1 is being added to the next one and so forth, and in the end, they are all nested and closed at once. For the example above, the HTML was closed with 3x </h1> at the end:
Notice how the ids and anchors were added, but the headers are not closed in their correct position. It also made an anchor the text line "Content content" which is not a header.
To Reproduce
Steps to reproduce the behavior:
Using Floki ~> 0.31.0
Using Elixir 1.12.2
Using Erlang OTP 24
With this code:
Floki.parse_fragment!(html)
|> Floki.traverse_and_update(fn
{"h1", attrs, children} = el ->
case find_node_text(children) do
nil -> el
text ->
id = Slug.slugify(text)
attrs = [{"id", id} | attrs]
anchor = {"a", [{"href", "#" <> id}, {"class", "anchor-link"}], []}
{"h1", attrs, [anchor | children]}
end
el ->
el
end)
|> Floki.raw_html()
# Find the header text
defp find_node_text([child | children]) when is_binary(child) and child != "",
do: if(String.match?(child, ~r/[<>]+/), do: find_node_text(children), else: child)
defp find_node_text([_ | children]), do: find_node_text(children)
defp find_node_text(_), do: nil
Description
Consider the basic HTML below, where I want to find each
<h1>
header, add anchors and generate a table of contents, by adding anid
attribute and prepend a child<a href/>
anchor into the<h1>
, for each<h1>
:I'm using
Floki.traverse_and_update/2
and then matching{"h1", attrs, children}
, where I extract the inner<h1>
text fromchildren
, generate the id and add toattrs
and add a<a/>
intochildren
.Result
The problem is that the previous h1 is being added to the next one and so forth, and in the end, they are all nested and closed at once. For the example above, the HTML was closed with 3x
</h1>
at the end:Notice how the ids and anchors were added, but the headers are not closed in their correct position. It also made an anchor the text line "Content content" which is not a header.
To Reproduce
Steps to reproduce the behavior:
Expected behavior