Closed aschilling closed 6 years ago
Thanks for the kind words. To help work out the problem, could you provide:
On Mon, 16 Jan 2017 01:36:41 -0800 Andreas Schilling notifications@github.com wrote:
Hi,
first of all congratulations for mammoth. It is really a great tool. Unfortuantely, when I run mammoth with by document I get the following error:
UnicodeEncodeError: 'charmap' codec can't encode character '\u2192' in position 42056: character maps to
Do you have any idea, what could be the issue here and how I could fix it. I run mammoth on windows 10
Did you manage to solve your issue?
OK let's close or action this issue!
First it seems that the encoding error doesn't come from python, but from the encoding that the console is using. So the way to fix it is to run the command (in windows):
chcp 65001
that sets the encoding to UTF-8 and then run Mammoth again. Or if working on pycharm, go to Settings>Editor>File Encodings and set the IDE and Project encodings accordingly.
Now the issue of the symbols. (Which are not recognised by Mammoth)
Symbols are specified with the w:sym element within the w:r element. A symbol is a special character that does not use any of the run fonts specified in rFonts or in the style hierarchy. The character is determined by pulling the hexadecimal value specified in the char attribute from the font specified in the font attribute. The char attribute specifies the hexadecimal code for the Unicode character value of the symbol. The value can be stored in either of the following formats:
F000
to the character value, thereby shifting the value into the Unicode private use area. This is done to allow interoperability with legacy word processing formats. So, if the value of the char attribute is F034, we would obtain the character value by removing F000 from F034 to obtain the character at the hexadecimal value 0x34 in the Wingdings font (or 52 as a decimal value).Only Unicode characters are officially supported in HTML and only those should be used, as not all browsers will have fonts such as Wingdings and is outside the scope of Mammoth
It sounds we can do two things.
Closing since I don't think there's anything further to investigate without more details of the unicode error.
Hi,
first of all congratulations for mammoth. It is really a great tool. Unfortuantely, when I run mammoth with by document I get the following error:
UnicodeEncodeError: 'charmap' codec can't encode character '\u2192' in position 42056: character maps to
Do you have any idea, what could be the issue here and how I could fix it. I run mammoth on windows 10
Update: In particular the issue occurs if you use "wingdings" font with character "§" symbol
Moreover I figured that symbols such as arrow keys are not exported correctly. Here I get the error: An unrecognised element was ignored: w:sym