jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.88k stars 3.39k forks source link

Mediawiki Reader: handle templates #4404

Open tobiasBora opened 6 years ago

tobiasBora commented 6 years ago

Hello,

First thank you for this great tool. I just saw a bug with some mediawiki content: the balise {{nowrap| <some text.....>}} seems to be removed from pandoc. Here is an example

$ ./pandoc -f mediawiki -t latex
{{nowrap|Hello}}

produces nothing. For a complete example, you can try this article and look at the section "LLL algorithm".

Thank you !

Pandoc version: pandoc 2.1.1

mb21 commented 6 years ago

{{nowrap... is a template, and it appears pandoc's mediawiki reader currently doesn't handle those (they are converted to RawBlock (Format "mediawiki")).

There's an outstanding TODO in the source code...

nichtich commented 6 years ago

Full support of MediaWiki syntax could benefit from a deeper look at the official JavaScript parser which was developed after the original MediaWiki parser in PHP. It includes a PEG grammar for MediaWiki syntax tokenizer which might be reusable in Haskell.

garfieldbanks commented 3 years ago

Have you looked into this ExpandTemplates? https://www.mediawiki.org/wiki/API:Expandtemplates https://www.mediawiki.org/wiki/Help:ExpandTemplates You just need to provide the source wiki, and then you should be able to get pandoc to properly convert everything.