philss / floki

Floki is a simple HTML parser that enables search for nodes using CSS selectors.
https://hex.pm/packages/floki
MIT License
2.05k stars 155 forks source link

Manipulate text content from HTML tags #207

Closed victormuramoto closed 5 years ago

victormuramoto commented 5 years ago

I am new in Elixir and Floki and I saw that it is possible to manipulate tags and attributes using Floki.map/2, but I am trying to manipulate text content and I do not have any success. Is there a way to do that with Floki?

E.g,: the tag is supposed to change the Google text content to Victor but nothing happens.

Obs.: I am not sure if the anonymous functions is correct for this case.

@html = """
  <html>
  <head>
  <title>Test</title>
  </head>
  <body>
    <div class="content">
      <a href="http://google.com" class="js-google js-cool">Google</a>
      <a href="http://elixir-lang.org" class="js-elixir js-cool">Elixir lang</a>
      <a href="http://java.com" class="js-java">Java</a>
    </div>
  </body>
  </html>
  """

 transformation = fn
  {"a", [{"href", "http://google.com"}, {"class", "js-google js-cool"}], ["Google"]} ->
    {"a", [{"href", "http://google.com"}, {"class", "js-google js-cool"}], ["Victor"]}

  x ->
    x
end

parsed_html = Floki.parse(html)

Floki.map(parsed_thml, transformation)

{"html", [],
 [
   {"head", [], [{"title", [], ["Test"]}]},
   {"body", [],
    [
      {"div", [{"class", "content"}],
       [
         {"a", [{"href", "http://google.com"}, {"class", "js-google js-cool"}],
          ["Google"]},
         {"a",
          [{"href", "http://elixir-lang.org"}, {"class", "js-elixir js-cool"}],
          ["Elixir lang"]},
         {"a", [{"href", "http://java.com"}, {"class", "js-java"}], ["Java"]}
       ]}
    ]}
 ]}
philss commented 5 years ago

@victormuramoto I think the problem is that the signature of what is expected by the transformation function is wrong. This function should receive only the tag name and attributes, and cannot receive the children. I don't think it is possible to do this right now, since we should change the way the transformation works by receiving the children nodes.

I didn't have the time to go deeper in this topic, but something similar to what Finder.map/2 is doing should work for you.

philss commented 5 years ago

@victormuramoto we added a new function called Floki.traverse_and_update/2 that works the way you expected. Please check the version 0.22.0 of Floki.