philss / floki

Floki is a simple HTML parser that enables search for nodes using CSS selectors.
https://hex.pm/packages/floki
MIT License
2.05k stars 155 forks source link

Proposal: optimized find for simple cases #515

Closed Valian closed 4 months ago

Valian commented 9 months ago

Feature goal

Currently, Floki.find always creates an HTMLTree, which seems like not always necessary. Consider finding all links, or elements with particular id or class. Basically all cases when we don't need to consider hierarchy of objects and it's enough to look at nodes one-by-one, with a simple traversal.

In a project I'm involved in I noticed slowdowns in that area. A quick traversal is an order of magnitude faster, without memory allocation.

Would you be interested in a PR covering such an optimization? I can try to work on this.

PS Sorry for lack of examples, I'm writing from mobile 😅

philss commented 9 months ago

@Valian I think it worth to explore that path, yes! I can't see a problem upfront. But like you said, this may work only for simple classes and ID selectors.

So please go ahead! Also, take a look at the latest merges - some of them were performance optimizations made by @ypconstante: https://github.com/philss/floki/commits/main/?author=ypconstante

Valian commented 8 months ago

Thanks, I'll look into this! I know it might work only for simple checks like IDs, tag names and classes, but I have a feeling it might be the most common usage. Eg. finding all links is such an example, which I believe is extremely common.

Will try to find some time on evenings in the next week - two to tackle this.

EDIT: I just checked @ypconstante commits and WOW, there are so many improvements! Possible speedup from my suggested optimization might not be that big, but I'll try to deliver some benchmarks anyway 👍

philss commented 7 months ago

@ypconstante @Valian do you think this can be closed?

Valian commented 7 months ago

@philss Sorry recently my family got bigger 👦 and didn't really had time to work on this... But I see @ypconstante did a great job! Are your recent PRs enough to solve this or there is some work still needed?

ypconstante commented 7 months ago

There are still some cases we can enable this optimization without too many changes, like the remaining combinators depending on the selector, and pseudo-classes that don't require the tree.

philss commented 4 months ago

@ypconstante I'm closing this now. Once again, thank you for the great work you did to bring these performance gains! ❤️