progsource / maddy

C++ Markdown to HTML header-only parser library
MIT License
194 stars 39 forks source link

[Enhancement] replace std::regex with a faster regex engine (like re2). #36

Open saile515 opened 1 year ago

saile515 commented 1 year ago

I would suggest replacing std::regex with a faster regex engine, for example re2. According to this stackoverflow benchmark, a library like re2 can be almost 40x faster than std::regex. Considering maddy is a little slow at the moment, I think this would greatly increase performance with minimal effort.

progsource commented 1 year ago

That sounds like a good idea. re2 also seems to have a good license. maddy has to stay functioning on the three major platforms and should not require something like boost, but as far as I can see, https://github.com/google/re2 is providing these requirements.

If we switch out to a different library to improve performance, it also has to be proven to be faster (like with Google Benchmark or something like that). I don't know yet when I will find the time to work on this. But I'm always open to PRs for that matter ;)

progsource commented 1 year ago

I tried out adding re2 and there are mainly 2 issues I see currently with it:

  1. It does not support (?!re) | before text not matching re (NOT SUPPORTED) (see https://github.com/google/re2/wiki/Syntax). With this it does not support how the regex for strong, italic, strikethrough and emphasized is written. There might be a possible regex combination that makes it then still work, but my main problem with this is, that it supports less regex functionality, which can result in a limit for possible future development.
  2. It is not header only, while maddy is a header only library. Therefor it blows up the library quiet a bit - also considering the requirement of including abseil. I tried it though with making a compiler flag which worked, so that it could be a possible option. This way one could still have the header only version.

In my tests the speed improvement wasn't that significant visible in the benchmarks. This might have multiple reasons and does not neccessarily mean, that it is not possible to gain more speed with re2 in the end.

I understand, that it would be nice, if maddy was faster and I still have an idea how to make that happen (own lexer/parser without regex) - but that will require basically a rewrite and I do consider it for maddy 2.0 at some point. Don't know yet though when I can get to that.