input normalization - Githubissues

GoogleCodeExporter commented 9 years ago

Attached diff includes code for combining newline normalization and Detab 
function into a function "NormalizeInput". It also ensures that the input 
ends in at least two newlines.

As expected, this change yields performance, although not a lot:

Benchmark before changes:
input string length: 475
7000 iterations in 4816 ms (0,688 ms per iteration)
input string length: 2356
2000 iterations in 5040 ms (2,52 ms per iteration)
input string length: 27737
180 iterations in 5131 ms (28,5055555555556 ms per iteration)
input string length: 11075
300 iterations in 3951 ms (13,17 ms per iteration)
input string length: 88607
40 iterations in 4288 ms (107,2 ms per iteration)
input string length: 354431
10 iterations in 4260 ms (426 ms per iteration)

Benchmark after changes:
input string length: 475
7000 iterations in 4688 ms (0,669714285714286 ms per iteration)
input string length: 2356
2000 iterations in 4968 ms (2,484 ms per iteration)
input string length: 27737
180 iterations in 4953 ms (27,5166666666667 ms per iteration)
input string length: 11075
300 iterations in 3840 ms (12,8 ms per iteration)
input string length: 88607
40 iterations in 4226 ms (105,65 ms per iteration)
input string length: 354431
10 iterations in 4243 ms (424,3 ms per iteration)

Original issue reported on code.google.com by Shio...@gmail.com on 11 Jan 2010 at 11:21

Attachments:

Normalize.diff

GoogleCodeExporter commented 9 years ago

excellent, checked in as r100 -- there is another opportunity to combine the 
_blankLines regex with this Normalize() routine as well, I think. But you'll 
have to 
use a line-oriented approach instead of the chunked way you're adding stuff to 
the 
stringbuilder now

Original comment by wump...@gmail.com on 12 Jan 2010 at 3:57

GoogleCodeExporter commented 9 years ago

Ate the _blankLines regex as well.

Well, the chunked way did indeed get a bit cumbersome with all the cases in 
which 
stuff is added to the stringbuilder now ;)

Performance did improve, but not a lot:

Performance before changes:
input string length: 475
4000 iterations in 2642 ms (0,6605 ms per iteration)
input string length: 2356
1000 iterations in 2480 ms (2,48 ms per iteration)
input string length: 27737
100 iterations in 2740 ms (27,4 ms per iteration)
input string length: 11075
200 iterations in 2535 ms (12,675 ms per iteration)
input string length: 88607
30 iterations in 3090 ms (103 ms per iteration)
input string length: 354431
10 iterations in 4095 ms (409,5 ms per iteration)

Performance after changes:
input string length: 475
4000 iterations in 2597 ms (0,64925 ms per iteration)
input string length: 2356
1000 iterations in 2460 ms (2,46 ms per iteration)
input string length: 27737
100 iterations in 2709 ms (27,09 ms per iteration)
input string length: 11075
200 iterations in 2526 ms (12,63 ms per iteration)
input string length: 88607
30 iterations in 3056 ms (101,866666666667 ms per iteration)
input string length: 354431
10 iterations in 4044 ms (404,4 ms per iteration)

P.S.: Would you consider upping the default number of executions of especially 
the 
last 3 tests? They are done rather quickly otherwise ;)

Original comment by Shio...@gmail.com on 12 Jan 2010 at 1:06

Attachments:

blanklines.diff

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

checked in as r102

I have to manually incorporate your .diff files by hand because they are not in 
a
format that Tortoise understands.. **I am not sure I got it right this time, 
can you
check?**

As for the benchmark, the last 3 benchmark calls are to measure the cost of 
calling
the whole thing once. The previous 3 benchmark calls are loops of many 
thousands. We
need both.

Original comment by wump...@gmail.com on 12 Jan 2010 at 10:56

GoogleCodeExporter commented 9 years ago

Normalize looks right to me.

Original comment by Shio...@gmail.com on 13 Jan 2010 at 12:34

GoogleCodeExporter commented 9 years ago

ok, very good -- closing this as fixed them. Thanks again for the contribution!

Original comment by wump...@gmail.com on 13 Jan 2010 at 12:44

Changed state: Fixed

locopablo / markdownsharp

input normalization #23