microsoft / knack

Knack - A Python command line interface framework
https://pypi.python.org/pypi/knack
MIT License
347 stars 95 forks source link

Always use UTF-8 for log file encoding #247

Closed jiasli closed 3 years ago

jiasli commented 3 years ago

Context

Reported by https://github.com/Azure/azure-cli/issues/17994

On Windows with English as the system language (without UTC-8 enabled), the system encoding by default is cp1252 (Western Europe), and Python will use cp1252 as the file encoding by default.

Writing Unicode characters like 汉字 to log file results in error:

Traceback (most recent call last):
  File "C:\Users\xxx\AppData\Local\Programs\Python\Python38\lib\logging\__init__.py", line 1084, in emit
    stream.write(msg + self.terminator)
  File "C:\Users\xxx\AppData\Local\Programs\Python\Python38\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 87-88: character maps to <undefined>

The error can also be easily reproduced with

with open("test.txt", "w") as f:
    print(f.encoding)
    f.write("汉字")
cp1252
Traceback (most recent call last):
  File "D:/cli/testproj/test1.py", line 2, in <module>
    f.write("汉字")
  File "C:\Users\jiasli\AppData\Local\Programs\Python\Python38\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-1: character maps to <undefined>

Change

This PR forces log files to use UTF-8 so that file logging works on non-UTF-8 systems as well.

Alternative solution

One may also follow https://github.com/microsoft/knack/pull/178 to change the default encoding of the system to UTF-8.

jiasli commented 3 years ago

If you want to test, simply enable file logging with az config set logging.enable_log_file=true and write some Chinese charaters to the log:

logger.warning("汉字")