xoofx / markdig

A fast, powerful, CommonMark compliant, extensible Markdown processor for .NET
BSD 2-Clause "Simplified" License
4.37k stars 453 forks source link

Feature wishlist #362

Open generateui opened 5 years ago

generateui commented 5 years ago

Hi all! I'm developing a library where standardized extensions can be written. The idea is to enable easy development of block and inline elements, which themselves have an internal syntax. I have built a layer on top of Markdig. In developing this library (which I will make available once I'm satisfied and licensing issues are fixed), I stumbled upon some things that I'd like to mention/discuss. These are:

  1. immutable AST
  2. easy AST creation
  3. Renderers with DI injected dependencies
  4. AST Transformers
  5. JS/CSS embedding

Details:

  1. Immutable AST While having a mutable AST is nice for performance, an immutable one is beneficial in preventing mistakes and gaining larger-sclae performance (multi-threading, async, reusability of AST parts)
  2. Easy AST creation This one is nice for testing and nice for "AST renderers". Some of my extensions don't render html but render markdown instead. This way normalized markdown rendering still works, and target html can reuse styling of the overall document stylesheet. In order to implement AST renderers, it's nice to provide a direct AST instead of a markdown string which is then parsed. It's faster, less error-prone and also typed.
  3. Some renderers/transformers for certain extensions I created need data from outside sources (i.e. database). Current implementations simply get these dependencies injected. This works, but adding extensions to the pipeline involves creating the extension instances. Passing a Func to AddIfNotAlready() would help here.
  4. When having an immutable AST, we can build AST transformers. They take an AST and produce a new AST where items from the input AST are reused in the output AST. This is fast and memory efficient.
  5. In my library, I have a way for extensions to provide javascript and css. For example, I have an extension which produces a BPMN 2.0 graph using the excellens bpmn-js lib. For this to work, I inject js+css in the rendered html document. While it's not hard to do so, and maybe out-of-place in the library like Markdig, it may be worth tinkering about.

I want to express my gratitude for publishing Markdig. This post is meant to take the existing solution and ponder about ways to move it forward.

generateui commented 5 years ago

Oh, I don't want to keep it with words: I'm willing to invest time in implementing it.

xoofx commented 5 years ago

Hi

Thanks for sharing your ideas. PR welcome within the following considerations:

For 1., I'm not very inclined for these kind of changes for Markdig. C# is so poorly equipped to provide immutable out of the box without having to fallback to a laborious design (like Roslyn) where you keep an internal tree mutable vs a public tree immutable. It will just kill the performance of Markdig, double allocations, making extensions and plugins also reallocate stuffs around... so I would prefer that we don't bring that.

For 2. sure, but no SyntaxFactory. I believe that a few constructors could be added... if we can leverage as much as possible on list initializers as well.

For 3. why not, assuming that we keep existing extensions working as it is. No dependencies to an external DI system. Func<> could be enough.

For 4. not sure we need something like that for now. We could have formalized process on MarkdownDocument at the end, but it would process the AST inline.

For 5. Markdig have been staying away from that, but maybe an extension providing this out-of-the-box could be possible, you can try.

generateui commented 5 years ago
  1. If you want to keep an internal mutable AST and external immutable AST, it is indeed a pain. I've implemented design with only an immutable AST though, which is very possible. It does incur some cost in orchestrating the AST and doing some non-obvious workarounds to make it happen. However, in a later stage I totally saw the benefit on my usecases - YMMV.
  2. This currently works, but requires some ctors here and there. Like you mention.
  3. OK
  4. Usecases I currently have:
    • inserting blocks in a target list indentation or target heading level. So I'd import a h2 with h3s and h4s as child under a h3 block - transforming the h2 to h4, h3 to h5 and h4 to h6.
  5. You probably want this in the core. I'm developing this still, and when I feel it is stable enough I'd like to tinker around with adding something on top, which imho should always be optional. You quickly run into generating a js/css app, where you need to resolve name collisions, dependency checking, efficient tree shaking et cetera. There are probably good libs to achieve this; maybe it should be even build on the npm level (using npm/babel as a lib).

It currently looks like the lib I'm developing is a "higher-level thing". It simply uses Markdig as a library and building on top of it, and I'm not sure if the extensions to the Markdig core I'm building should be part of Markdig.