peterjc / mediawiki_to_git_md

Convert a MediaWiki export XML file into MarkDown as a series of git commits
MIT License
55 stars 17 forks source link

Fix for UTF-8 characters in wiki page names and contents #24

Closed scheibel closed 7 years ago

scheibel commented 7 years ago

This is a quick fix I needed while using your script. I think the repeated UTF-8 conversion could be eliminated but that wasn't the scope of this fix.

peterjc commented 7 years ago

Were you running this on Python 2 or Python 3?

(I was probably using Python 2 as I don't recall this being an issue for me)

scheibel commented 7 years ago

I'm running Python 2.7.12, python 3 crashes way too early on your script. My wiki pages contained UTF-8 characters (ä, ö, ü, ß, and à) in both titles and contents.

peterjc commented 7 years ago

This is ringing a bell now, I think I saw this on one user page but in the end didn't convert the user pages on that wiki. I thought I would have filed an issue but I can't find it.

Merged since if it worked for you, it will likely help someone else in future. Right now I am unlikely to need the script again myself.

Thank you!