scanny / python-pptx

Create Open XML PowerPoint documents in Python
MIT License
2.43k stars 527 forks source link

UnicodeEncodeError on Win10, Python 3.6.6 #471

Closed miek770 closed 5 years ago

miek770 commented 5 years ago

Hi,

I followed these steps to handle a powerpoint file in Git: https://www.ficonsulting.com/filabs/MSOfficeGit

When using git diff after having modified an MS Office 365 ProPlus powerpoint file, I get the following error:

$ git diff
Traceback (most recent call last):
  File "C:/Users/michela.lavoie/gits/formation_etap/formation_etap/pptx-textconv.py", line 19, in <module>
    print(text_runs)
  File "C:\Python36\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u221a' in position 19069: character maps to <undefined>
fatal: unable to read files to diff

It appears to be related to #139, but I'm using version 0.6.17 of python-pptx, installed from PyPI.

I'm running git bash in mintty 2.6.2 (x86_64-pc-msys).

The offending character '\u221a' appears to be a square root.

Apart from the powerpoint file itself, would you need additional information to point me in the right direction? Should I use another terminal?

Thank you,

Michel

scanny commented 5 years ago

Well, that's a question for the script author I suppose, but I expect the problem is rooted in a mismatch of the default unicode encoding.

python-pptx uses UTF-8, and it looks like your default encoding is CP1252 (Windows).

Maybe print([encode(run, "UTF-8") for run in text_runs]) will work, but that's just an educated guess. print() outputs bytes, and text_runs looks like a list of str, which in Python3 is unicode.

In any case, there's no evidence that python-pptx isn't behaving as expected.

Btw, support questions like this are best asked on StackOverflow, using the "python-pptx" tag. They get more attention there and don't clog up the issues list.

miek770 commented 5 years ago

Indeed, this worked: print([run.encode("UTF-8") for run in text_runs]). Thank you.

scanny commented 5 years ago

Glad you got it working :)