progsource / maddy

C++ Markdown to HTML header-only parser library
MIT License
203 stars 40 forks source link

Much slower than MD4C #50

Open nuttyartist opened 1 year ago

nuttyartist commented 1 year ago

Hello! Thanks for this library. I was wondering why for the same text I got such a difference performance:

Maddy took 5304 milliseconds Qt took 5 milliseconds

Maddy code:

std::stringstream markdownInput("some text...");
m_markdownParser->Parse(markdownInput);

Qt code:

QString markdownInput("some text...");
QTextDocument textDoc;
textDoc.setMarkdown(markdownInput);
textDoc.toHtml();

EDIT: By mistake I set it as a feature request.

progsource commented 1 year ago

When it comes to performance tests there are certain things that play into results, for example:

So currently it is difficult to know the exact reasons for your results.

Besides that maddy's regex way of doing things might slow down currently processing Markdown. In version 2 I plan to remove the usage of regex and go with another approach which hopefully will speed maddy up. (Which I - of course - will benchmark) But until then maddy might not be the fastest solution.

I'm working every now and then on version 2, but cannot commit yet to a release date due to RL and maddy being a side-project.

Of course - if somebody finds a way to speed things up a little in the meantime - I'm always happy for contributions.

nuttyartist commented 1 year ago

Excuse my late reply. Here's a reproducible test with the first chapter of Moby Dick in Markdown: https://gist.github.com/nuttyartist/cb0053ccda823ac98a7ce58f296269cc

I got somewhat consistent results of the following: During Debug mode:

Maddy took 84380 milliseconds
MD4C took 0 milliseconds

During Release mode:

Maddy took 17552 milliseconds
MD4C took 0 milliseconds

EDIT: I edited the title after realizing Qt is using MD4C underneath.

vedderb commented 10 months ago

I ran into the performance-issue too and for me that almost makes maddy unusable. After some profiling and testing I found that the culprits are the following parsers:

EMPHASIZED_PARSER ITALIC_PARSER STRIKETHROUGH_PARSER STRONG_PARSER

What they have in common is a long regexp that seems to take long to evaluate. I don't know if this breaks anything, but I replaced them with the following loops:

EmphasizedParser

void
  Parse(std::string& line) override
  {
      std::string pattern = "_";
      std::string newPattern = "em";

      for (;;) {
          int patlen = pattern.size();

          auto pos1 = line.find(pattern);
          if (pos1 == std::string::npos) {
              break;
          }

          auto pos2 = line.find(pattern, pos1 + patlen);
          if (pos2 == std::string::npos) {
              break;
          }

          std::string word = line.substr(pos1 + patlen, pos2 - pos1 - patlen);
          line = line.replace(pos1, (patlen + pos2) - pos1, "<" + newPattern + ">" + word + "</" + newPattern + ">");
      }
  }

ItalicParser

void
  Parse(std::string& line) override
  {
      std::string pattern = "*";
      std::string newPattern = "i";

      for (;;) {
          int patlen = pattern.size();

          auto pos1 = line.find(pattern);
          if (pos1 == std::string::npos) {
              break;
          }

          auto pos2 = line.find(pattern, pos1 + patlen);
          if (pos2 == std::string::npos) {
              break;
          }

          std::string word = line.substr(pos1 + patlen, pos2 - pos1 - patlen);
          line = line.replace(pos1, (patlen + pos2) - pos1, "<" + newPattern + ">" + word + "</" + newPattern + ">");
      }
  }

StrikeThroughParser

void
  Parse(std::string& line) override
  {
      std::string pattern = "~~";
      std::string newPattern = "s";

      for (;;) {
          int patlen = pattern.size();

          auto pos1 = line.find(pattern);
          if (pos1 == std::string::npos) {
              break;
          }

          auto pos2 = line.find(pattern, pos1 + patlen);
          if (pos2 == std::string::npos) {
              break;
          }

          std::string word = line.substr(pos1 + patlen, pos2 - pos1 - patlen);
          line = line.replace(pos1, (patlen + pos2) - pos1, "<" + newPattern + ">" + word + "</" + newPattern + ">");
      }
  }

StrongParser

void
  Parse(std::string& line) override
  {
      std::string pattern = "**";
      std::string newPattern = "strong";

      for (;;) {
          int patlen = pattern.size();

          auto pos1 = line.find(pattern);
          if (pos1 == std::string::npos) {
              break;
          }

          auto pos2 = line.find(pattern, pos1 + patlen);
          if (pos2 == std::string::npos) {
              break;
          }

          std::string word = line.substr(pos1 + patlen, pos2 - pos1 - patlen);
          line = line.replace(pos1, (patlen + pos2) - pos1, "<" + newPattern + ">" + word + "</" + newPattern + ">");
      }

      pattern = "__";

      for (;;) {
          int patlen = pattern.size();

          auto pos1 = line.find(pattern);
          if (pos1 == std::string::npos) {
              break;
          }

          auto pos2 = line.find(pattern, pos1 + patlen);
          if (pos2 == std::string::npos) {
              break;
          }

          std::string word = line.substr(pos1 + patlen, pos2 - pos1 - patlen);
          line = line.replace(pos1, (patlen + pos2) - pos1, "<" + newPattern + ">" + word + "</" + newPattern + ">");
      }
  }

I didn't measure how much faster this is, but my application went from being very laggy when parsing markdown-files to no lag that I can notice at all.

This is just a quick fix and I don't have time at the moment to clean it up and test it more, otherwise I would make a pull request. Just sharing it hoping that it is useful.