Closed bnoordhuis closed 13 years ago
Hey Ben! That's a very nice patch -- but there's quite an issue here. Yesterday I was also trying to optimize the SmartyPants method myself (not as aggressively as you did, though ;D), and I found there's a pretty serious problem in the design of the method.
Because of the way Upskirt handles rendering callbacks, every time it encounters an "active character", it breaks down the render_normal
callback to try to render that active character -- this happens even if the character doesn't end up generating any special HTML. What this means is that most of the time, the smartypants
method gets strings split in half (e.g. "this is a t"
"est string"
), which makes a lot of multi-character substitutions impossible.
So, after giving a little bit of thought, I've decided to do the same thing as the official SmartPants implementation: do the substitutions as a post-processing pass on top of the rendered HTML, instead of embedding it during the rendering process. You can check the implementation in 789bcd98e19eabcc4c36eca256370cdc978838a3.
Sorry for the wasted work -- I didn't really find out until yesterday that doing SmartyPants subs at that level was just not viable, because we don't really use SmartyPants here at GitHub.
Hopefully you can take a look at the new SmartyPants code and improve its performance. I'm sure there's still plenty of room for speed. Cheers!
That seems sensible enough, Vicent. Don't worry about any wasted work - if it weren't for our mutual friend yesterday, I wouldn't have submitted this at all.
Unrelated: would you be so kind as to sharing what benchmark code and input are you using? I'd like to test the new code, see how slow it is.
Cheers man!
Sure thing but it's nothing fancy. Define INPUT to whatever you want to test. I used a bunch of concatenated README.md's.
https://gist.github.com/958207
PS: It's threaded because I wanted to profile malloc() lock contention. Compile with -DN_THREADS=n where n is the number of CPUs in your system.
EDIT: I just updated the gist to use the new post-processing function and I'm seeing some pretty hefty differences. time ./benchmark 1000
on cfa4e0934de0d1318c179f4b226b48b49b0dfb65 gives me this:
real 0m1.082s
user 0m2.030s
sys 0m0.070s
And this for 7cb1a9e0270952ab6e4d8f4aa7d8b92ba10ee12e, the current master:
real 0m17.755s
user 0m32.800s
sys 0m0.950s
That can't be right, can it? Am I invoking the function wrong?
Hi Vicent, here is the patch I mentioned, rebased against the current master.
rndr_smartypants() is optimized by dropping the O(n) loop over the entity table. With an input of length m you get O(nm) total so that made it pretty expensive.
Performance statistics, median of three. Before:
After:
So that's a pretty big jump in performance.
The second commit micro-optimizes the loop some more but it clutters the code and doesn't make an appreciable difference so you probably shouldn't take it.