outofcontrol / mediawiki-to-gfm

Converts Mediawiki format to Github Flavoured Markdown format
85 stars 21 forks source link

pandoc crashes with unexpected '<' #23

Closed orzel closed 1 year ago

orzel commented 1 year ago

It crashes when i try the convertion. I don't know where the

comes from, i checked the mediawiki dump, and it's not there. I couldn't check /tmp/pandoc638e46d7bac03, it doesn't exist anymore.

I use pandoc 2.18 and mediawiki-to-gfm as a git checkout from today.

web-php ~/clones/mediawiki-to-gfm # ./convert.php --filename=/tmp/mediawiki.dump.xml --output=converted 
Error at "/tmp/pandoc638e46d7bac03" (line 3, column 1):
unexpected '<'
<table> <tr> <td>
^
Pandoc\PandocException: Pandoc could not convert successfully, error code: 65. Tried to run the following command: /usr/bin/pandoc --from=mediawiki --to=gfm /tmp/pandoc638e46d7bac03 in /root/clones/mediawiki-to-gfm/vendor/ryakad/pandoc-php/src/Pandoc/Pandoc.php:287
Stack trace:
#0 /root/clones/mediawiki-to-gfm/app/src/Convert.php(194): Pandoc\Pandoc->runWith()
#1 /root/clones/mediawiki-to-gfm/app/src/Convert.php(149): App\Convert->runPandoc()
#2 /root/clones/mediawiki-to-gfm/app/src/Convert.php(117): App\Convert->convertData()
#3 /root/clones/mediawiki-to-gfm/convert.php(50): App\Convert->run()
#4 {main}
outofcontrol commented 1 year ago

I believe this to be a question for the the pandoc folks.

orzel commented 1 year ago

I don't know. I've narrowed down the problem to some actual html included in some mediawiki page. The page were displayed ok, so i guess mediawiki allow that. It's even nicely handled in their xml export. But then pandoc fail on this.