Closed GoogleCodeExporter closed 9 years ago
I wanted this too and created a patch for it. The problems I saw were:
1. Constants used for options => No way to use this as a library without
compiling
from source.
2. Needed to be able to set options from configuration file.
So what this patch does is the following:
1. New class, MarkdownOptions, which contains all the options that were
previously
constants. The class is immutable, the options can be set explicitly at
construction
time, or it can use the defaults that were used before, or use the defaults +
overrides from configuration files. The config options are placed in
appSettings and
take the form Markdown.PropertyName, e.g. Markdown.AutoHyperlink. (Maybe later
this
should use its own config section but that seemed overkill for now).
Having the options per instance creates some other changes as well. The regular
expressions that use the options are now per instance, not static anymore. The
ones
that don't use any options are still static.
2. Realistically for most cases you only need a single instance, and probably
most of
the time you'd want defaults + config overrides. So there is a static property,
Default, on the Markdown class so you can just go Markdown.Default.Transform(s)
if
that's all you want.
3. The Markdown class is immutable so if you want to change options later you
must
create a new instance. This avoids having to keep track of option changes to
recompile regular expressions. Immutable == good :)
And that's about it. Added bin/obj to the ignore list as well.
I'll probably look at the failing unit tests later this week if I have a chance.
Original comment by hviturha...@gmail.com
on 4 Jan 2010 at 11:26
Attachments:
Whoops, accidentally left some debugging code in there. Here's a clean version
of the
patch. Sorry about that.
Original comment by hviturha...@gmail.com
on 4 Jan 2010 at 12:19
Attachments:
I appreciate the contribution, but after applying the patch locally I am seeing
a 15%
increase in benchmark times.
I assume this is because the affected regular expressions are not static?
I had always intended this to be used as a library, compiled from source code
.. what
specific scenario did you have where you cannot do that?
Original comment by wump...@gmail.com
on 5 Jan 2010 at 1:14
[deleted comment]
ouch -- on top of the 15% overall performance decline, the single call
performance
got 10 times worse:
before patch:
input string length: 11075
1 iteration in 16 ms
input string length: 88607
1 iteration in 196 ms
input string length: 354431
1 iteration in 1635 ms
after patch:
1 iteration in 174 ms
input string length: 88607
1 iteration in 345 ms
input string length: 354431
1 iteration in 2000 ms
Original comment by wump...@gmail.com
on 5 Jan 2010 at 1:27
Performance aside, I think this is valuable because it would allow people to
use the
library without needing to compile it. We could conceivably distribute a .dl
users link
into their project instead. This is a pretty standard model for distributing
library
code.
Second, and this is what actually triggered me to log this bug, is that we
could
actually test the functionality of the options flags. Right now it's
impossible to test
whether they work without recompiling the code. I would hate to have to
recompile
the code a bunch of times as part of an automated test suite.
All that said, a 15% performance degration is pretty serious. But I think some
of that
may be an artifact of how the tests are executed (do the tests create many
instances
of the markdown class? do we expect users to?).
Original comment by kevin.ba...@gmail.com
on 5 Jan 2010 at 3:10
I am not sure a dinky little markdown.cs justifies an entire DLL.. does it?
I understand the desire for configuration, but a 15% (and up to 10x, depending
on how
you call it) performance penalty seems like an extremely steep cost for
configurability not everyone will need.
Original comment by wump...@gmail.com
on 5 Jan 2010 at 3:31
Yes, I agree.
We should not do this unless we find a way to do it without sacrificing
performance.
The .dll bit I'm not overly concerned about, but in order to write unit tests
for the
options functionality, we need this.
Original comment by kevin.ba...@gmail.com
on 5 Jan 2010 at 3:38
The performance issue is weird, since the regular expressions themselves didn't
change at all, they only went from being static to instance variables. And they
are
compiled in the constructor which is executed outside the benchmark itself.
All in all I'm seeing a slightly worse performance on most tests (0.0x ms per
iteration) except for the really weard single call case. I'll look at that and
see if
I can see what the issue is.
As for whether this makes sense or not, well, personally I like to pull in
dependencies for my projects as .dll's, not source code. But yes, maybe it's
overkill
for a single Markdown.cs.
Original comment by hviturha...@gmail.com
on 5 Jan 2010 at 8:08
Hmmm, after running some tests it seems that the single call performance is only
significantly better in the static case when the other benchmarks have been run
before. If I comment out the first three benchmarks (that run thousands of
times) and
only run the last three (single iteration ones) I get these results:
WITH OPTIONS:
input string length: 11075
1 iteration in 134 ms
ORIGINAL:
input string length: 11075
1 iteration in 132 ms
DIFFERENCE: 1,52%
-----------------------------------------------
WITH OPTIONS:
input string length: 88607
1 iteration in 445 ms
ORIGINAL:
input string length: 88607
1 iteration in 399 ms
DIFFERENCE: 11,53%
-----------------------------------------------
WITH OPTIONS:
input string length: 354431
1 iteration in 4134 ms
ORIGINAL:
input string length: 354431
1 iteration in 4028 ms
DIFFERENCE: 2,63%
-----------------------------------------------
As far as I can tell this has something to do with how the Regex implementation
caches commonly used patterns, perhaps even compiling them behind the scenes?
Most of
the regexes used are compiled but not all.
Anyway, I can accept that the change is not worth it from a performance
standpoint.
But I would still argue that if the class only has static members then the class
should be static as well. Creating many instances of a class that has no
instance
data seems useless. And perhaps config options could be read in the static
initialization?
Original comment by hviturha...@gmail.com
on 5 Jan 2010 at 12:06
[deleted comment]
let's try this again.. I made the entire class static, can we tackle the
configuration once more? new patch?
Original comment by wump...@gmail.com
on 6 Jan 2010 at 1:21
ok, I reverted the all-static change -- as you pointed out, this isn't
thread-safe
Original comment by wump...@gmail.com
on 6 Jan 2010 at 11:19
Well, here is a new patch for this. The difference is that the configuration is
static, that is, you configure the options with a static method, the regular
expressions are all static so essentially every instance you create of the
Markdown
class will use the same options. It still gives you the ability to read the
options
from a config file, and reconfigure it later through code
(Markdown.Configure(new
MarkdownOptions(...)), which will just recompile all the static regular
expressions.
As for performance, it seems comparable with the older code. I ran the
benchmarks a
few times, sometimes the newer is a bit slower, sometimes a bit faster. It's a
couple
of milliseconds here and there.
Original comment by hviturha...@gmail.com
on 7 Jan 2010 at 8:08
Attachments:
[deleted comment]
[deleted comment]
speed wise it looks OK (excellent!), but I'm not crazy about the way the
regexes are
moved so far away from the code they are relevant to.
It kind of utterly destroys the physical ordering and logical locality of the
code..
and makes it a PITA to work on. Because when you're troubleshooting, say,
DoAutoLinks() .. and you want to check _autolinkBare, it's wayyyyyy up at the
top of
the file. In a big jumble of code that has no relationship to each other.
It kind of stops being enjoyable to work on, for me, at that point.
Original comment by wump...@gmail.com
on 7 Jan 2010 at 8:44
[deleted comment]
[deleted comment]
[deleted comment]
ok, I made changes in r88 that should move us closer -- we just need a way to
rebuild
all the regular expressions when those properties change .. perhaps reflection
or
something?
we think only .TabWidth will be problematic. (I made .NestingDepth private, as
to me
that is 100% an implementation detail and should never REALLY need to be
changed.. we
might even be able to use the proprietary .NET regex stack balancing extensions
to
remove it entirely, eventually..)
The rest of the options, with the sole exception of .TabWidth, should work, as
they
don't rely on being compiled into the regular expressions .. take a look and
see what
you think?
Original comment by wump...@gmail.com
on 7 Jan 2010 at 10:18
Well I would say the easiest way to get around that would be to just have a
single
CompileXXX method for each of the regexes that need the options. Just have the
method
be where the static initialization used to be, and then have one main
CompileRegularExpressions that calls all the others. Then you can just as
easily look
at the regex when you need to.
Original comment by hviturha...@gmail.com
on 7 Jan 2010 at 11:07
Ok, I just totally missed your point that everything except _tabwidth and
_nestDepth
doesn't rely on the regexes being compiled. Then I'd just say leave those two as
constants and have the rest as properties, that's fine. _tabWidth is arguably
more
appropriate as a constant anyway.
Original comment by hviturha...@gmail.com
on 7 Jan 2010 at 11:28
yes, I'd say go ahead and adapt your cool load-config-from-file for this
version -- I
think all the config properties should work with the exception of TabWidth
Original comment by wump...@gmail.com
on 7 Jan 2010 at 12:49
Ok, I've created a new patch. Actually, since removing nestDepth and tabWidth
from
the equation the whole thing just became a lot simpler. All the other config
things
were only used in instance methods, so I could just make them normal properties
not
static ones. So, you can create a new object, change its properties and do what
you
want without affecting anything else.
Also added a constructor that takes in a bool loadOptionsFromConfigFile, the
default
is still not to do it so no changes have to be made to existing tests or code.
And finally added unit tests to make sure the configuration is read from
appsettings
and that changing the properties on an instance has the desired effect.
No performance problems compared to a clean copy from SVN.
Original comment by hviturha...@gmail.com
on 8 Jan 2010 at 8:46
Attachments:
[deleted comment]
ok checked in as r95
Original comment by wump...@gmail.com
on 11 Jan 2010 at 2:10
Original issue reported on code.google.com by
kevin.ba...@gmail.com
on 4 Jan 2010 at 2:41