icyleaf / markd

Yet another markdown parser, Compliant to CommonMark specification, written in Crystal.
MIT License
109 stars 30 forks source link

Optimizations #19

Closed asterite closed 4 years ago

asterite commented 5 years ago

This PR optimizes Markd code. Each commit has one optimization. The most controversial one is about not indexing strings with indices like string[index] but instead using string.byte_at(index). Because Markdown is an ASCII format (well, there's unicode but the control characters like *, _, [, etc. are all ASCII) we can go byte per byte. But all optimizations here are good.

I used two benchmarks:

require "./src/markd"
require "benchmark"
require "markdown"

text = "hello world"
# text = File.read("README.md")

Benchmark.ips do |x|
  x.report("std") { Markdown.to_html(text) }
  x.report("markd") { Markd.to_html(text) }
end

The first one is the code above. The second one is using the uncommented line, where "README.md" is this repo's README.md (I actually wanted to use the "sample markdown file" mentioned in the README but it seems it doesn't exist anymore).

In the code above require "markdown" is just the std's (incomplete, buggy) markdown.

Results

Hello world

Before this PR

  std   3.49M (286.71ns) (± 2.95%)    384B/op        fastest
markd 342.15k (  2.92µs) (± 4.93%)  4.09kB/op  10.19× slower

After this PR

  std   3.59M (278.37ns) (± 0.81%)    384B/op        fastest
markd 962.57k (  1.04µs) (± 1.79%)  1.46kB/op   3.73× slower

README.md

Before this PR

  std  25.44k ( 39.31µs) (± 1.39%)  46.1kB/op        fastest
markd   2.22k (449.56µs) (± 1.48%)   408kB/op  11.44× slower

After this PR

  std  25.92k ( 38.58µs) (± 0.56%)  46.1kB/op        fastest
markd   5.22k (191.64µs) (± 1.64%)   175kB/op   4.97× slower

Conclusion

It seems time and memory went down by 3 times.

I tried to optimize more but it's harder and harder without me trying to understand more the code. But I think this is good enough: 3~4 times slower than std but without bugs and feature complete is really good! And I'm tempted to ask you to maybe move markd to the standard library so we have a really good doc generate and a good Markdown library in the standard library, but we'll have to discuss that in the Crystal repo.

asterite commented 5 years ago

Another fact: I tried benchmarking rendering markd's README.md with markd and crystal-cmark, which is just a wrapper around a C library, and these are the results:

      markd   5.18k (193.20µs) (± 3.62%)   170kB/op   2.42× slower
common_mark  12.53k ( 79.78µs) (± 4.53%)  6.48kB/op        fastest

So markd is pretty good, considering it wasn't written with optimizations in mind (I think there's still room for improvement). Just 2.42 times slower is actually pretty good. The memory consumption there isn't real because the benchmark just shows memory Crystal knows about, not memory consumed by C, so we can't compare that.