Add async version of traverse_and_update

thiagomajesk commented 3 years ago

Hi! I'm using Floki to traverse an update a document where I'm generating URL previews. Since I'm unfurling those URLs while traversing the document, this operation can get quite expensive if done synchronously. Because of that, I'd like to make a proposal to make this process asynchronous by introducing a traverse_and_update_async function that would allow processing the matched nodes in parallel. Something like this:

Floki.traverse_and_update_async(fn -> 
  {"a", [{"href", href}], _children} -> 
  Task.async(fn -> {"div", [], unfurl(href)} end)
  html_attribute -> Task.async(html_attribute)
end)

The function traverse_and_update_async would expect a Task to be returned and then we could: Task.await_many(tasks) at the end to collect the results.

PS.: I think that depending on how we want to treat nested nodes, we would have to preemptively evaluate/ await some of the tasks because the modified value doesn't exist yet.

philss commented 3 years ago

Hi @thiagomajesk, thanks for opening the issue! :purple_heart:

I'm inclined to not add this feature because you don't have control of how many process it would create, and also because I want to keep floki without processes in order to keep it simple.

Considering that we would have to traverse the tree again to await the modifications, what about this approach for your case:

first traverse the tree gathering the URLs and adding an attribute with the identifier of that URL - it could be a class with the hash of that URL;
- you would spawn a Task for this URL and would send it's pid and this URL id/hash to another process;
in the end, for each task you would return the preview and the URL id that you assigned to an attribute;
you would traverse the modified tree and update the nodes with its previews.

WDYT?

thiagomajesk commented 3 years ago

Hi @philss!

I'm inclined to not add this feature because you don't have control of how many process it would create, and also because I want to keep floki without processes in order to keep it simple.

Humm, I see... I'll close the issue then.

BTW, thanks for trying to help. I'll test your suggestion. Cheers!

philss / floki

Add async version of traverse_and_update #325