Encoding when printing utf-8 to windows console

anlx-sw commented 6 years ago

I have encoding problems with xkcdpass to create passphrases from a wordlist with utf-8 chars and printing them the the windows console.

This works without problems on Linux the problem is only on windows.

Maybe the output has to be prepared somehow to work on the windows console: https://neurocline.github.io/dev/2016/10/13/python-utf8-windows.html

Environment:

Python 3.6.4
xkcdpass installed via pip (xkcdpass-1.14.3)

I tested it with the compiled C:\Python36\Scripts\xkcdpass.exe which pip is installing. The problem is the same in the normal cmd.exe - console as well as in the powershell.exe console.

Sample Output: Herrscher Silber fÃ¶rdern PlÃ¤doyer verstehe AblÃ¶sung

I think that should read: Herscher Silber fördern Plädoyer verstehe Ablösung

Update: if i echo the "umlauts" with the python.exe directly started in the windows console i get no errors:

> python
Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:54:40) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print("Umlaute ÄÖÜäöÜ")
Umlaute ÄÖÜäöÜ
>>>

If i try to use xkcdpass as a module the same error occures:

> python
Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:54:40) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import xkcdpass
>>> from xkcdpass import xkcd_password as xp
>>> wordfile = xp.locate_wordfile("ger-anlx-sorted.txt")
>>> mywords = xp.generate_wordlist(wordfile=wordfile)
>>> print(xp.generate_xkcdpassword(mywords))
Georgios verlangte TÃ¶ne holte teilten unbekannt
>>>

This should read Georgios verlangte Töne holte teilten unbekannt

florianjacob commented 6 years ago

As your test with printing unicode directly succeeded, I suspect this is because of this open call. I assume that the word file is stored on Windows as utf-8 as well, but the open() call uses the platform-dependent default encoding. On Linux, this is utf-8, on Windows, this is ISO-8859-1 (I think), which would explain your findings.

Can you try what happens when you change that line to this?

    with open(wordfile, encoding='utf-8') as wlf:

redacted commented 6 years ago

Quick test in Windows 10 suggests that @florianjacob fix above works. I've pushed the change, can you check if it works for you?

anlx-sw commented 6 years ago

yes - i can confirm that this fix works for me. thanks.

redacted commented 6 years ago

This fix is in the 1.16.1 release - thanks again

redacted / XKCD-password-generator

Encoding when printing utf-8 to windows console #89