philss / floki

Floki is a simple HTML parser that enables search for nodes using CSS selectors.
https://hex.pm/packages/floki
MIT License
2.07k stars 156 forks source link

Revisit Floki.map API #286

Closed josevalim closed 4 years ago

josevalim commented 4 years ago

The newtraverse_and_update has been quite helpful, thank you, we are making extensive use of it in LiveView to optimize some traversals!

Given its introduction, it may be worth even visiting other parts of the API because I believe Floki.map can be completely replaced by Floki.traverse_and_update:

  def map(html_tree_list, fun) do
    traverse_and_update(html_tree_list, fn
      {tag, attrs, children} ->
        {tag, attrs} = fun.({tag, attrs})
        {tag, attrs, children}

      other ->
        other
    end)
  end

The difference is that traverse_and_update is quite more efficient as it doesn't need to build the HTML tree. Given map has a shorter name, people may be more inclined towards using it, while using traverse_and_update would be wiser.

Another possible approach to take here is to deprecate map in favor of a find_and_update(tree, selector, fun) function, which works like map but updates only the nodes given by the selector, keeping the others intact. This would leverage the power of the HTML Tree and introduce a feature set not covered today. In fact, if there is a find_and_update, one could easily implement something like Floki.attr/4 on top of find_and_update.

josevalim commented 4 years ago

Oh, another cool idea. If find_and_update allows fun to return nil for removals (like traverse_and_update), then filter_out can be implemented as: find_and_update(node, select, fn _ -> nil end). So it can really be a function used as building block for a bunch of other functionality. :)

philss commented 4 years ago

@josevalim thank you for the issue and ideas! :purple_heart: Replacing Floki.map/2 with a Floki.find_and_update/3 seems the better way to go. I don't like very much the Floki.map/2 function and I wanted to deprecate it before.

We can do this in some steps I think:

What do you think?

josevalim commented 4 years ago

If the plan is to remove map, i wouldn't change it to use traverse_and_update/2, because what if it introduces minor incompatibilities? For example, traverse_and_update/2 and find traverse in different order (find does parent first, traverse children first). While that should not be a problem for map, you never know what people are doing, like using map for side-effects.

philss commented 4 years ago

It makes sense. So let's go for the deprecation/removal without the refactoring o/

philss commented 4 years ago

I'm considering this done since we deprecated the function and introduced Floki.find_and_update/3. I plan to remove Floki.map/2 in future versions. Thank you, @josevalim ! :purple_heart: