kzykhys / Ciconia

A New Markdown parser for PHP5.4
http://ciconia.kzykhys.com/
MIT License
355 stars 31 forks source link

Improve performance #13

Closed kzykhys closed 10 years ago

kzykhys commented 10 years ago
$ php bin/markbench benchmark --profile=github-sample
Runtime: PHP5.5.3
Host:    Linux vm1 3.8.0-31-generic #46-Ubuntu SMP Tue Sep 10 20:03:44 UTC 2013 x86_64
Profile: Sample content from Github (http://github.github.com/github-flavored-markdown/sample_content.html) / 1000 times
Class:   Markbench\Profile\GithubSampleProfile

+----------------------+---------+---------+---------------+---------+--------------+
| package              | version | dialect | duration (MS) | MEM (B) | PEAK MEM (B) |
+----------------------+---------+---------+---------------+---------+--------------+
| erusev/parsedown     | 0.4.6   |         | 10819         | 6291456 | 6553600      |
| michelf/php-markdown | 1.3     |         | 36887         | 6815744 | 6815744      |
| michelf/php-markdown | 1.3     | extra   | 49626         | 6815744 | 7340032      |
| kzykhys/ciconia      | v0.1.4  |         | 64959         | 7340032 | 7602176      |
| kzykhys/ciconia      | v0.1.4  | gfm     | 68987         | 7077888 | 7602176      |
+----------------------+---------+---------+---------------+---------+--------------+
evert commented 10 years ago

If I understand the source correctly, every extension runs its own regular expressions on either the full text, or on every line.

If performance will be one of your goals, I think you should consider not spending too much time improving the current architecture, but rather try to build a system that first tokenizes, and then transforms it into html.

An entire tokenizer could be written by a lot less regular expression, and each individual extension could add it's on regular expression to one big regex. There may be some exceptions to that rule if there's any non-regular syntax in markdown, but I'd imagine you'd get quite far with just that...

just a thought anyway...

kzykhys commented 10 years ago

I agree. Current architecture depends on too much regular expressions, that might actually slow things down. also call_user_func.

I tried to implement the Ciconia using lexer and parser pattern before, then failed...

kzykhys commented 10 years ago
$ php bin/markbench benchmark --profile=github-sample
Runtime: PHP5.5.3
Host:    Linux vm1 3.8.0-31-generic #46-Ubuntu SMP Tue Sep 10 20:03:44 UTC 2013 x86_64
Profile: Sample content from Github (http://github.github.com/github-flavored-markdown/sample_content.html) / 1000 times
Class:   Markbench\Profile\GithubSampleProfile

+----------------------+------------+------------+---------------+---------+--------------+
| package              | version    | dialect    | duration (MS) | MEM (B) | PEAK MEM (B) |
+----------------------+------------+------------+---------------+---------+--------------+
| erusev/parsedown     | 0.4.7      |            | 12095         | 6291456 | 6553600      |
| michelf/php-markdown | 1.3        |            | 38704         | 6815744 | 7077888      |
| michelf/php-markdown | 1.3        | extra      | 51304         | 6815744 | 7340032      |
| kzykhys/ciconia      | dev-master |            | 64837         | 7340032 | 7602176      |
| kzykhys/ciconia      | dev-master | gfm        | 68255         | 7340032 | 7602176      |
+----------------------+------------+------------+---------------+---------+--------------+