Optimize rndr_smartypants()

bnoordhuis commented 13 years ago

Hi Vicent, here is the patch I mentioned, rebased against the current master.

rndr_smartypants() is optimized by dropping the O(n) loop over the entity table. With an input of length m you get O(nm) total so that made it pretty expensive.

Performance statistics, median of three. Before:

$ time ./benchmark 10000
real    0m12.221s
user    0m22.910s
sys     0m0.210s

After:

$ time ./benchmark 10000
real    0m6.158s
user    0m11.400s
sys     0m0.180s

So that's a pretty big jump in performance.

The second commit micro-optimizes the loop some more but it clutters the code and doesn't make an appreciable difference so you probably shouldn't take it.

vmg commented 13 years ago

Hey Ben! That's a very nice patch -- but there's quite an issue here. Yesterday I was also trying to optimize the SmartyPants method myself (not as aggressively as you did, though ;D), and I found there's a pretty serious problem in the design of the method.

Because of the way Upskirt handles rendering callbacks, every time it encounters an "active character", it breaks down the render_normal callback to try to render that active character -- this happens even if the character doesn't end up generating any special HTML. What this means is that most of the time, the smartypants method gets strings split in half (e.g. "this is a t" "est string"), which makes a lot of multi-character substitutions impossible.

So, after giving a little bit of thought, I've decided to do the same thing as the official SmartPants implementation: do the substitutions as a post-processing pass on top of the rendered HTML, instead of embedding it during the rendering process. You can check the implementation in 789bcd98e19eabcc4c36eca256370cdc978838a3.

Sorry for the wasted work -- I didn't really find out until yesterday that doing SmartyPants subs at that level was just not viable, because we don't really use SmartyPants here at GitHub.

Hopefully you can take a look at the new SmartyPants code and improve its performance. I'm sure there's still plenty of room for speed. Cheers!

bnoordhuis commented 13 years ago

That seems sensible enough, Vicent. Don't worry about any wasted work - if it weren't for our mutual friend yesterday, I wouldn't have submitted this at all.

vmg commented 13 years ago

Unrelated: would you be so kind as to sharing what benchmark code and input are you using? I'd like to test the new code, see how slow it is.

Cheers man!

bnoordhuis commented 13 years ago

Sure thing but it's nothing fancy. Define INPUT to whatever you want to test. I used a bunch of concatenated README.md's.

https://gist.github.com/958207

PS: It's threaded because I wanted to profile malloc() lock contention. Compile with -DN_THREADS=n where n is the number of CPUs in your system.

EDIT: I just updated the gist to use the new post-processing function and I'm seeing some pretty hefty differences. time ./benchmark 1000 on cfa4e0934de0d1318c179f4b226b48b49b0dfb65 gives me this:

real    0m1.082s
user    0m2.030s
sys     0m0.070s

And this for 7cb1a9e0270952ab6e4d8f4aa7d8b92ba10ee12e, the current master:

real    0m17.755s
user    0m32.800s
sys     0m0.950s

That can't be right, can it? Am I invoking the function wrong?

vmg / sundown

Optimize rndr_smartypants() #15