Much slower than MD4C - Githubissues

nuttyartist commented 1 year ago

Hello! Thanks for this library. I was wondering why for the same text I got such a difference performance:

Maddy took 5304 milliseconds Qt took 5 milliseconds

Maddy code:

std::stringstream markdownInput("some text...");
m_markdownParser->Parse(markdownInput);

Qt code:

QString markdownInput("some text...");
QTextDocument textDoc;
textDoc.setMarkdown(markdownInput);
textDoc.toHtml();

EDIT: By mistake I set it as a feature request.

progsource commented 1 year ago

When it comes to performance tests there are certain things that play into results, for example:

Operating System
currently running apps on the system (so any other running processes, that can slow down a test)
How many times did you run the tests?

So currently it is difficult to know the exact reasons for your results.

Besides that maddy's regex way of doing things might slow down currently processing Markdown. In version 2 I plan to remove the usage of regex and go with another approach which hopefully will speed maddy up. (Which I - of course - will benchmark) But until then maddy might not be the fastest solution.

I'm working every now and then on version 2, but cannot commit yet to a release date due to RL and maddy being a side-project.

Of course - if somebody finds a way to speed things up a little in the meantime - I'm always happy for contributions.

nuttyartist commented 1 year ago

Excuse my late reply. Here's a reproducible test with the first chapter of Moby Dick in Markdown: https://gist.github.com/nuttyartist/cb0053ccda823ac98a7ce58f296269cc

I got somewhat consistent results of the following: During Debug mode:

Maddy took 84380 milliseconds
MD4C took 0 milliseconds

During Release mode:

Maddy took 17552 milliseconds
MD4C took 0 milliseconds

EDIT: I edited the title after realizing Qt is using MD4C underneath.

vedderb commented 10 months ago

I ran into the performance-issue too and for me that almost makes maddy unusable. After some profiling and testing I found that the culprits are the following parsers:

EMPHASIZED_PARSER ITALIC_PARSER STRIKETHROUGH_PARSER STRONG_PARSER

What they have in common is a long regexp that seems to take long to evaluate. I don't know if this breaks anything, but I replaced them with the following loops:

EmphasizedParser

void
  Parse(std::string& line) override
  {
      std::string pattern = "_";
      std::string newPattern = "em";

      for (;;) {
          int patlen = pattern.size();

          auto pos1 = line.find(pattern);
          if (pos1 == std::string::npos) {
              break;
          }

          auto pos2 = line.find(pattern, pos1 + patlen);
          if (pos2 == std::string::npos) {
              break;
          }

          std::string word = line.substr(pos1 + patlen, pos2 - pos1 - patlen);
          line = line.replace(pos1, (patlen + pos2) - pos1, "<" + newPattern + ">" + word + "</" + newPattern + ">");
      }
  }

ItalicParser

void
  Parse(std::string& line) override
  {
      std::string pattern = "*";
      std::string newPattern = "i";

      for (;;) {
          int patlen = pattern.size();

          auto pos1 = line.find(pattern);
          if (pos1 == std::string::npos) {
              break;
          }

          auto pos2 = line.find(pattern, pos1 + patlen);
          if (pos2 == std::string::npos) {
              break;
          }

          std::string word = line.substr(pos1 + patlen, pos2 - pos1 - patlen);
          line = line.replace(pos1, (patlen + pos2) - pos1, "<" + newPattern + ">" + word + "</" + newPattern + ">");
      }
  }

StrikeThroughParser

void
  Parse(std::string& line) override
  {
      std::string pattern = "~~";
      std::string newPattern = "s";

      for (;;) {
          int patlen = pattern.size();

          auto pos1 = line.find(pattern);
          if (pos1 == std::string::npos) {
              break;
          }

          auto pos2 = line.find(pattern, pos1 + patlen);
          if (pos2 == std::string::npos) {
              break;
          }

          std::string word = line.substr(pos1 + patlen, pos2 - pos1 - patlen);
          line = line.replace(pos1, (patlen + pos2) - pos1, "<" + newPattern + ">" + word + "</" + newPattern + ">");
      }
  }

StrongParser

void
  Parse(std::string& line) override
  {
      std::string pattern = "**";
      std::string newPattern = "strong";

      for (;;) {
          int patlen = pattern.size();

          auto pos1 = line.find(pattern);
          if (pos1 == std::string::npos) {
              break;
          }

          auto pos2 = line.find(pattern, pos1 + patlen);
          if (pos2 == std::string::npos) {
              break;
          }

          std::string word = line.substr(pos1 + patlen, pos2 - pos1 - patlen);
          line = line.replace(pos1, (patlen + pos2) - pos1, "<" + newPattern + ">" + word + "</" + newPattern + ">");
      }

      pattern = "__";

      for (;;) {
          int patlen = pattern.size();

          auto pos1 = line.find(pattern);
          if (pos1 == std::string::npos) {
              break;
          }

          auto pos2 = line.find(pattern, pos1 + patlen);
          if (pos2 == std::string::npos) {
              break;
          }

          std::string word = line.substr(pos1 + patlen, pos2 - pos1 - patlen);
          line = line.replace(pos1, (patlen + pos2) - pos1, "<" + newPattern + ">" + word + "</" + newPattern + ">");
      }
  }

I didn't measure how much faster this is, but my application went from being very laggy when parsing markdown-files to no lag that I can notice at all.

This is just a quick fix and I don't have time at the moment to clean it up and test it more, otherwise I would make a pull request. Just sharing it hoping that it is useful.

progsource / maddy

Much slower than MD4C #50