While searching for potential points in the code to which I could contribute I came across this FIXME comment
which seemed like an easy-win:
// FIXME: linear time search, do we care?
(line 511 in html5ever/src/tokenizer/mod.rs)
Fix attempt
I Tries using a HashSet for the attributes names so the search's complexity would be O(1) instead of O(n), as
can be seen in this commit
and once I ran some benchmarks on multiple websites (by modifying the noop-tokenizer example) I realized I only made things worst, the Parsing process took a tiny bit longer than before. This is probably because of the overhead within maintaining a HashSet of the attributes names, and because most elements do not have enough attributes for the HashSet to improve performance.
Conclusion
I think it's a good Idea to remove this comment so no one else will try to optimize this section for no reason.
Background
While searching for potential points in the code to which I could contribute I came across this FIXME comment which seemed like an easy-win:
(line 511 in
html5ever/src/tokenizer/mod.rs
)Fix attempt
I Tries using a HashSet for the attributes names so the search's complexity would be O(1) instead of O(n), as can be seen in this commit
and once I ran some benchmarks on multiple websites (by modifying the noop-tokenizer example) I realized I only made things worst, the Parsing process took a tiny bit longer than before. This is probably because of the overhead within maintaining a HashSet of the attributes names, and because most elements do not have enough attributes for the HashSet to improve performance.
Conclusion
I think it's a good Idea to remove this comment so no one else will try to optimize this section for no reason.