joshids / markdownsharp

Automatically exported from code.google.com/p/markdownsharp
0 stars 0 forks source link

Escaping Performance #25

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Something that is called very often are EncodeBoldItalics and brethren, 
which escape sensitive characters. Therefore they seemed to lend themselves 
to a little performance nudge.

Seizing the opportunity, I also encapsulated the whole escape mechanisms 
into their own (internal static) class.

Since _escapeTable accesses happen all over the place, I attached both 
Markdown.cs and the (new) Escapes.cs.

Now, the juicy bits are about 10% big:

Before changes:
input string length: 475
8000 iterations in 5222 ms (0,65275 ms per iteration)
input string length: 2356
2000 iterations in 5003 ms (2,5015 ms per iteration)
input string length: 27737
180 iterations in 4944 ms (27,4666666666667 ms per iteration)
input string length: 11075
375 iterations in 4852 ms (12,9386666666667 ms per iteration)
input string length: 88607
45 iterations in 4693 ms (104,288888888889 ms per iteration)
input string length: 354431
12 iterations in 5125 ms (427,083333333333 ms per iteration)

After changes:
input string length: 475
8000 iterations in 4761 ms (0,595125 ms per iteration)
input string length: 2356
2000 iterations in 4558 ms (2,279 ms per iteration)
input string length: 27737
180 iterations in 4403 ms (24,4611111111111 ms per iteration)
input string length: 11075
375 iterations in 4461 ms (11,896 ms per iteration)
input string length: 88607
45 iterations in 4280 ms (95,1111111111111 ms per iteration)
input string length: 354431
12 iterations in 4520 ms (376,666666666667 ms per iteration)

Original issue reported on code.google.com by Shio...@gmail.com on 13 Jan 2010 at 3:31

Attachments:

GoogleCodeExporter commented 9 years ago
while I supported consolidating Normalize() since that really is the same 
logical
operation, I'm worried that we're getting into the realm of micro-optimization 
here.
While I agree you can save *some* time by doing..

            code = code.Replace(@"\", _escapeTable[@"\"]);
            code = code.Replace("*", _escapeTable["*"]);
            code = code.Replace("_", _escapeTable["_"]);
            code = code.Replace("{", _escapeTable["{"]);
            code = code.Replace("}", _escapeTable["}"]);
            code = code.Replace("[", _escapeTable["["]);
            code = code.Replace("]", _escapeTable["]"]);

as a one pass operation, it's also a much less *readable* one pass operation -- 
lots
more code for a rather small performance increase.

Not sure if it's even worth it in all cases. If I comment out 
EscapeBoldItalic() and
make it a no-op (invalid), I don't get any appreciable performance benefit for 
those
15+ additional lines of code, either.

While I definitely appreciate the contribution, and it IS faster, I am not sure 
it's
fast enough to justify adding a second class and a bunch more lines of code. I'd
rather streamline and simplify what we have whenever possible.

Original comment by wump...@gmail.com on 13 Jan 2010 at 11:58

GoogleCodeExporter commented 9 years ago
Although I definitely see your point about how much encoding and escaping there 
is in
the file -- I have grouped it all in a region for now while we think about this.

I am open to hard-coding the escape and hashes of the escape characters, 
because I do
not feel these are things that are going to realistically change; they're part 
of the
markdown "spec". We could include a unique partial marker in each hash that we 
could
index off of, rather than treating each one as an individual thing.

I wonder if there is a simpler way to do this, with that in mind... 

anyway, I will check in a re-organized ver of the code in a bit, let's try this 
again.

Original comment by wump...@gmail.com on 13 Jan 2010 at 12:45

GoogleCodeExporter commented 9 years ago
Just a quaint thought: Why do we not simply use html escape codes for anything 
that we 
wish to escape - and LEAVE them in the result?

Original comment by Shio...@gmail.com on 14 Jan 2010 at 1:24

GoogleCodeExporter commented 9 years ago
that seems like the wrong approach.. and it'd break all the unit test suites.

Original comment by wump...@gmail.com on 14 Jan 2010 at 1:42

GoogleCodeExporter commented 9 years ago
Did you want to take another run at this based on the current version?

I'm not against "unrolling loops" as long as the benefit is fairly clear, 
compared to
the increase in code size.

I appreciate all your contributions to date -- they've been quite helpful!

Original comment by wump...@gmail.com on 17 Jan 2010 at 7:08

GoogleCodeExporter commented 9 years ago
Probably, I will - when I get the time ;)

Original comment by Shio...@gmail.com on 17 Jan 2010 at 4:51

GoogleCodeExporter commented 9 years ago
no worries. I'll close this one as it's based on the 1.10 code which has 
changed.

You *definitely* had the right idea to put all the Encoding/Escaping code 
together. I
had no idea how much of that there was until you pointed it out.

Original comment by wump...@gmail.com on 18 Jan 2010 at 3:35