jalaluddin / markdownsharp

Automatically exported from code.google.com/p/markdownsharp
0 stars 0 forks source link

RegexOptions.Multiline expression changes #5

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Instead of using the lookahead and end of file comparison of 

(?=/n+|/Z)

I changed it to the plain old

$

which when used in combination of Multiline, it does the same thing as the 
statement above, and seems to yield times of a 14-20% decrease in the 
execution time.

patched:
{{{
input string length: 475
performed 1000 iterations in 1025 (1.025 ms per iteration)
input string length: 2356
performed 500 iterations in 2267 (4.534 ms per iteration)
input string length: 27737
performed 100 iterations in 6246 (62.46 ms per iteration)
}}}

un-patched:
{{{
input string length: 475
performed 1000 iterations in 1289 (1.289 ms per iteration)
input string length: 2356
performed 500 iterations in 2755 (5.51 ms per iteration)
input string length: 27737
performed 100 iterations in 7236 (72.36 ms per iteration)
}}}

Original issue reported on code.google.com by nberardi on 28 Dec 2009 at 4:53

Attachments:

GoogleCodeExporter commented 9 years ago
hmm. The nature of some of these changes, I am not sure they are correct. Did 
you
verify the unit tests still pass (no new failures were created)?

For example, you changed

(?=\n+|\Z)

to 

$

The above is a positive lookahead, whereas the below is a simple "match end of 
line".
At the very least, I think it would need to be:

(?=$)

And even then, deviating too far from the regexes in the markdown.pl v1.0.1 and
markdown.pl v1.0.2b8 is IMHO not a good idea -- at least not without a VERY good
reason. (See my DeTab() changes for one time that can happen..)

Original comment by wump...@gmail.com on 28 Dec 2009 at 6:49

GoogleCodeExporter commented 9 years ago
get latest -- I added a crc-16 to the output HTML file generation in
GenerateTestOutput("mdtest-1.1"). Confirm that no *.xxxx.actual.html files 
change --
the xxxx is the crc-16 of that file -- between the version you get from source
control, and the version with your proposed regex changes.

Original comment by wump...@gmail.com on 28 Dec 2009 at 8:52

GoogleCodeExporter commented 9 years ago
can you send a new patch of *just* the changes you are proposing here?

The other patch has a bunch of unrelated changes..

Original comment by wump...@gmail.com on 29 Dec 2009 at 2:47

GoogleCodeExporter commented 9 years ago
Hi Jeff,

This change isn't as dramatic as it was before the compiled flags were added, 
but I 
think it cleans up some of the Regex logic and uses some built in processors 
for the 
RegexOptions.Multiline instead of doing the start and end lines manually.

Nick

Original comment by nberardi on 29 Dec 2009 at 3:01

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks. I grabbed the patch, applied it, and tested.

Decline for two reasons:

1) that block is about to change majorly when I pull across the new HTML block
detection in markdown.pl 1.0.2b8 (fixes the two test failures in 
/mdtest-1.1-alt)

2) I'm not a big fan of changing the regexes from the original Perl versions 
unless
there is a really COMPELLING reason to do so. I don't see any performance 
benefit in
the benchmarks, or additional correctness from making these changes.. so 
they're just
change for the sake of change in an area where it's a bad idea to have change --
because we want to be able to fully sync with the Perl version whenever we can.

Original comment by wump...@gmail.com on 29 Dec 2009 at 4:14

GoogleCodeExporter commented 9 years ago
Agreed.  There seemed to be a performance gain, but it has been mitigated by 
changes 
since I submitted this.  

Also I was unaware that markdown (perl version) was still being actively 
developed.  So  
(2) makes alot of sense to me now.  I will keep that in mind for any future 
patches I 
submit.

Original comment by nberardi on 29 Dec 2009 at 4:39