vmg / sundown

Standards compliant, fast, secure markdown processing library in C
1.99k stars 385 forks source link

markdown analysis instead of parsing #69

Closed kballenegger closed 12 years ago

kballenegger commented 12 years ago

Hey,

This is not bug with Sundown per se, more of a design issue... Wanted to get some thoughts on this.

I'm working on a project that requires not parsing Markdown to HTML, but rather rendering Markdown to the Cocoa text system via character ranges. I'm trying to use Sundown to extract information about the Markdown data.

Since Sundown is designed around the idea of parsing the input text into a different output text, this clashes with my use case. The ideal library would call the renderer to describe the source, and it would be the renderer's (trivial) job to assemble output, when that is the intended use case.

I had a couple questions:

Cheers, -Kenneth

PS: If you're interested, my use case is rewriting the core of getmacchiato.com, to use a properly built and solid markdown library instead of the incredibly inefficient regular expression mess I'm using at the moment.

vmg commented 12 years ago

Hey, this is a pretty interesting topic. Let's see what can we make out of it:

The thing with Markdown is that it's a "text markup language", and as such there are very few corner cases (such as yours) were the output of the rendering is not going to be also text. That's something that shows on every single MD parser out there, which are basically designed to output HTML and very little more.

The kind of approach you require (parsing the MD, building an AST and then rendering through the AST) offers very little benefits to a language like MD; in fact, for 99% of the use cases, this two-phase rendering is just going to make the process slower.

If you want my brotip, what I'd do is spend ~30 minutes writing a Sundown renderer that transforms MD to XML, and then I'd use a Cococa-based XML parser to build a tree that could potentially be used as an AST for whatever you need. This may be rather slow, depending on which XML parser you leverage, but I think it's a viable option. (feel free to replace XML with any other text-based, tree-structured, data markup language).

Your mileage may vary, sorry I can't be more helpful. Oh, and as far as I know, there are no other MD parsers that do what you're trying to accomplish. Sorry again!