Implement converter from markdown to TeXmacs

mgubi commented 3 years ago

I've researched a bit the topic and come up with several possible realistic solutions.

Use a C/C++ parser. Several nice and feasible possibilities, among which two:
- libsoldout (http://fossil.instinctive.eu/libsoldout/home) : no dependencies
- peg-markdown (https://github.com/jgm/peg-markdown) : uses peg/leg to generate the parser from a user-friendly description of the markdown grammar. Cons: depends on an external tool. Pro: maybe we can modify the parser description to generate directly C++ code with TeXmacs classes. Anyway it is easy to get the syntax tree and then generate from it a TeXmacs document
Do it in scheme.
- then we could adapt a parser combinator library like comprarsed (Chicken scheme) [http://wiki.call-cc.org/eggref/5/comparse] and use the associated markdown parser (lowdown) [http://wiki.call-cc.org/eggref/5/lowdown]
- Or we can adapt the packrat library (http://wiki.call-cc.org/eggref/5/packrat) and use one of the above peg grammars.
Improve/Use TeXmacs packrat parser. We have already a parser (for semantic editing) which is implemented in C++ while the grammars are described in Scheme. I do not understand right now if there is a way to obtain/generate a parse tree for a successful parse. If yes, then we have just to adapt one of the above grammars and then transform the parse tree into an appropriate TeXmacs document.

mdbenito commented 3 years ago

I think the path of least resistance is scheme. Super-fast development cycle, no toolchain to take care of for compilation (and easier multiplatform distribution) and decent-enough speed (with improvements maybe to come thanks to @mgubi ;)

Also, the internal representation as "markdown scheme tree" could be shared between both sides of the converter. Which means that the current tm->md would greatly benefit, because it has, to put it mildly, grown rather "organically" from a few-days hack into an ugly monstrosity.

That being said, my second choice would be the parser currently in TeXmacs, if this didn't mean modifications upstream which would have to be made very carefully, so instead I'd go for the simplest approach, libsoldout ?

mgubi commented 3 years ago

It could be. The C/C++ way would be just a blackbox which convert a string into a texmacs or scheme tree. Even if we use the peg/leg systems this means just generate the parser once. A priori these parsers have been used in the wild and they are fairly complete.

But having a home solution is also attractive. Develop or at least understand how to use (:)) the internal packrat parser would be useful to other parsing tasks (like syntax highlighting) or parsing other format. Being implemented in C++ make it very fast. So we can maybe obtain rapidly a scheme tree from it and from a description of the grammar in scheme. This would lead to most development in the Scheme side.

If we realise that the TeXmacs parser is still not versatile enough we can adapt one of the Scheme libraries above to our brand of Scheme, TeXmacs Scheme :)

texmacs / markdown

Implement converter from markdown to TeXmacs #15