Open petdance opened 2 years ago
How do you envision the cache? An in-memory LRU or is this a hash that can grow quite large? Have you tried seeing what kind of speed gains can be had via Memoize
?
Just a hash that could theoretically but not likely grow quite large, as I showed in the code above.
I didn't bother with Memoize
because I figure we wouldn't want to add another dependency, but now I see that Memoize has been core since 5.7.3, so not an issue.
I think we could get some speedups in
encode_entities
by caching some common operations. The examples below would most benefit users encoding lots of data that is heavy on non-named, numeric entities. We may have similar benefits elsewhere.The first is to cache the results of the
sprintf
innum_entity
. This could be done with no effect on behavior, in exchange for some hash entries. Here are my experiments.gives these results:
Bottom line: Hash lookup is faster than the
sprintf
, so let's cache it.The other tweak would be to cache the call to
num_entity
inside the main regex inencode_entities
. Swap this:for this
This would have the side effect of modifying the
%char2entity
hash, which is visible to the outside world. If that wasn't OK, we could have a private copy of the hash specifically so it would be modifiable. The potential downside (or upside?) of that would be that if someone outside the module modified%char2entity
, it would have no effect onencode_entities
.For benchmarking
encode_entities
, I used this:Results:
42,281/s for the original unmodified
encode_entities
.52,746/s if the
encode_entities
used the cachingnum_entity
first mentioned, but the main regex is unchanged.64,769/s if the main conversion regex caches the results of calls to
num_entity
in%char2entity
. Changing this to call the cachingnum_entity
gave no noticeable improvement.I hope these give some ideas.
encode_entities
is an absolute workhorse at my job (we generate everything with Template Toolkit), and I'm sure for many many others. Any speedup would have wide-ranging benefits.