noahmorrison / chevron

A Python implementation of mustache
MIT License
486 stars 52 forks source link

Force utf-8 encoding when opening files #44

Closed akosthekiss closed 5 years ago

akosthekiss commented 5 years ago

The default open(..., 'r') may lead to errors when utf-8 encoded files are read on a platform with non-utf-8 default locale / preferred encoding. The io package allows to specify encoding for open in a Python 2 & 3-compatible way.

coveralls commented 5 years ago

Coverage Status

Coverage remained the same at ?% when pulling 430915f81701dee7394e1e24609764711baf88ce on akosthekiss:fix-open-unicode into 8abcc371f507239c3374300e6d5cd1365aca7893 on noahmorrison:master.

akosthekiss commented 5 years ago

Issue was triggered when running tox on the following system setup:

Interestingly, tox runs fine on a Ubuntu 18.04.1 LTS system. Some digging has revealed that on macOS (and only on macOS), locale.getpreferredencoding() got set to US-ASCII from the otherwise default UTF-8 when inside tox. This tox behaviour is quite strange, but it signals an issue nonetheless.

Notes:

Tox error snippets:

``` Traceback (most recent call last): File "/Users/akiss/devel/gpufuzz/chevron/test_spec.py", line 54, in globals()[spec] = _test_case_from_path(os.path.join(SPECS_PATH, spec)) File "/Users/akiss/devel/gpufuzz/chevron/test_spec.py", line 22, in _test_case_from_path class MustacheTestCase(unittest.TestCase): File "/Users/akiss/devel/gpufuzz/chevron/test_spec.py", line 39, in MustacheTestCase yaml = json.load(f) File "/Users/akiss/.pyenv/versions/3.4.6/lib/python3.4/json/__init__.py", line 265, in load return loads(fp.read(), File "/Users/akiss/devel/gpufuzz/chevron/.tox/py34/lib/python3.4/encodings/ascii.py", line 26, in decode return codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 466: ordinal not in range(128) ``` ``` ..........E/Users/akiss/.pyenv/versions/3.4.6/lib/python3.4/unittest/case.py:608: ResourceWarning: unclosed file <_io.TextIOWrapper name='tests/data.json' mode='r' encoding='US-ASCII'> outcome.errors.clear() ........ ====================================================================== ERROR: test_main (__main__.ExpandedCoverage) ---------------------------------------------------------------------- Traceback (most recent call last): File "chevron/test_spec.py", line 125, in test_main partials_path='tests') File "chevron/chevron/main.py", line 22, in main data = json.load(data_file) File "/Users/akiss/.pyenv/versions/3.4.6/lib/python3.4/json/__init__.py", line 265, in load return loads(fp.read(), File "chevron/.tox/py34/lib/python3.4/encodings/ascii.py", line 26, in decode return codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 245: ordinal not in range(128) ``` ``` ..........E........ ====================================================================== ERROR: test_main (__main__.ExpandedCoverage) ---------------------------------------------------------------------- Traceback (most recent call last): File "chevron/test_spec.py", line 125, in test_main partials_path='tests') File "chevron/chevron/main.py", line 32, in main return render(**args) File "chevron/chevron/renderer.py", line 178, in render for tag, key in tokens: File "chevron/chevron/tokenizer.py", line 168, in tokenize template = template.read() File "chevron/.tox/py34/lib/python3.4/encodings/ascii.py", line 26, in decode return codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1067: ordinal not in range(128) ``` ``` ..........E........ ====================================================================== ERROR: test_main (__main__.ExpandedCoverage) ---------------------------------------------------------------------- Traceback (most recent call last): File "chevron/chevron/renderer.py", line 88, in _get_partial return partials_dict[name] KeyError: 'unicode' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "chevron/test_spec.py", line 125, in test_main partials_path='tests') File "chevron/chevron/main.py", line 32, in main return render(**args) File "chevron/chevron/renderer.py", line 318, in render partials_path, partials_ext) File "chevron/chevron/renderer.py", line 95, in _get_partial return partial.read() File "chevron/.tox/py34/lib/python3.4/encodings/ascii.py", line 26, in decode return codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128) ``` ``` ........../Users/akiss/.pyenv/versions/2.7.13/lib/python2.7/unittest/case.py:503: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal if not first == second: F........................................................................................................................ ====================================================================== FAIL: test_main (__main__.ExpandedCoverage) ---------------------------------------------------------------------- Traceback (most recent call last): File "chevron/test_spec.py", line 131, in test_main self.assertEqual(result, expected) AssertionError: '\nvariable test\n===\ntest\n===\ntest\n===\n\ncomment test\n===\n===\n===\n\nhtml escape test (triple brackets)\n===\n< > & "\n===\n< > & "\n===\n\nhtml escape test (ampersand)\n===\n< > & "\n===\n< > & "\n===\n\nhtml escape test (normal)\n===\n< > & "\n===\n< > & "\n===\n\nsection test (truthy)\n===\ntrue\n===\ntrue\n===\n\nsection test (falsy)\n===\n===\n===\n\nsection test (list)\n===\nnumber: 1\nname: one\n---\nnumber: 2\nname: two\n---\nnumber: 3\nname: three\n---\n===\nnumber: 1\nname: one\n---\nnumber: 2\nname: two\n---\nnumber: 3\nname: three\n---\n===\n\nsection test (scope)\n===\ntest\nnew test\n===\ntest\nnew test\n===\n\ninverted section test (truthy)\n===\n===\n===\n\ninverted section test (falsy)\n===\nfalse\n===\nfalse\n===\n\npartial test\n===\nthis is a partial\n===\nthis is a partial\n===\n\ndelimiter test\n===\ntest\ntest\n===\ntest\ntest\n===\n\nunicode test (basic)\n===\n(\xe2\x95\xaf\xc2\xb0\xe2\x96\xa1\xc2\xb0\xef\xbc\x89\xe2\x95\xaf\xef\xb8\xb5 \xe2\x94\xbb\xe2\x94\x81\xe2\x94\xbb\n===\n(\xe2\x95\xaf\xc2\xb0\xe2\x96\xa1\xc2\xb0\xef\xbc\x89\xe2\x95\xaf\xef\xb8\xb5 \xe2\x94\xbb\xe2\x94\x81\xe2\x94\xbb\n===\n\nunicode test (variable)\n===\n(\xe2\x95\xaf\xc2\xb0\xe2\x96\xa1\xc2\xb0\xef\xbc\x89\xe2\x95\xaf\xef\xb8\xb5 \xe2\x94\xbb\xe2\x94\x81\xe2\x94\xbb\n===\n(\xe2\x95\xaf\xc2\xb0\xe2\x96\xa1\xc2\xb0\xef\xbc\x89\xe2\x95\xaf\xef\xb8\xb5 \xe2\x94\xbb\xe2\x94\x81\xe2\x94\xbb\n===\n\nunicode test (partial)\n===\n(\xe2\x95\xaf\xc2\xb0\xe2\x96\xa1\xc2\xb0\xef\xbc\x89\xe2\x95\xaf\xef\xb8\xb5 \xe2\x94\xbb\xe2\x94\x81\xe2\x94\xbb\n===\n(\xe2\x95\xaf\xc2\xb0\xe2\x96\xa1\xc2\xb0\xef\xbc\x89\xe2\x95\xaf\xef\xb8\xb5 \xe2\x94\xbb\xe2\x94\x81\xe2\x94\xbb\n===\n\nunicode test (no-escape)\n===\n(\xe2\x95\xaf\xc2\xb0\xe2\x96\xa1\xc2\xb0\xef\xbc\x89\xe2\x95\xaf\xef\xb8\xb5 \xe2\x94\xbb\xe2\x94\x81\xe2\x94\xbb\n===\n(\xe2\x95\xaf\xc2\xb0\xe2\x96\xa1\xc2\xb0\xef\xbc\x89\xe2\x95\xaf\xef\xb8\xb5 \xe2\x94\xbb\xe2\x94\x81\xe2\x94\xbb\n===\n' != u'\nvariable test\n===\ntest\n===\ntest\n===\n\ncomment test\n===\n===\n===\n\nhtml escape test (triple brackets)\n===\n< > & "\n===\n< > & "\n===\n\nhtml escape test (ampersand)\n===\n< > & "\n===\n< > & "\n===\n\nhtml escape test (normal)\n===\n< > & "\n===\n< > & "\n===\n\nsection test (truthy)\n===\ntrue\n===\ntrue\n===\n\nsection test (falsy)\n===\n===\n===\n\nsection test (list)\n===\nnumber: 1\nname: one\n---\nnumber: 2\nname: two\n---\nnumber: 3\nname: three\n---\n===\nnumber: 1\nname: one\n---\nnumber: 2\nname: two\n---\nnumber: 3\nname: three\n---\n===\n\nsection test (scope)\n===\ntest\nnew test\n===\ntest\nnew test\n===\n\ninverted section test (truthy)\n===\n===\n===\n\ninverted section test (falsy)\n===\nfalse\n===\nfalse\n===\n\npartial test\n===\nthis is a partial\n===\nthis is a partial\n===\n\ndelimiter test\n===\ntest\ntest\n===\ntest\ntest\n===\n\nunicode test (basic)\n===\n(\u256f\xb0\u25a1\xb0\uff09\u256f\ufe35 \u253b\u2501\u253b\n===\n(\u256f\xb0\u25a1\xb0\uff09\u256f\ufe35 \u253b\u2501\u253b\n===\n\nunicode test (variable)\n===\n(\u256f\xb0\u25a1\xb0\uff09\u256f\ufe35 \u253b\u2501\u253b\n===\n(\u256f\xb0\u25a1\xb0\uff09\u256f\ufe35 \u253b\u2501\u253b\n===\n\nunicode test (partial)\n===\n(\u256f\xb0\u25a1\xb0\uff09\u256f\ufe35 \u253b\u2501\u253b\n===\n(\u256f\xb0\u25a1\xb0\uff09\u256f\ufe35 \u253b\u2501\u253b\n===\n\nunicode test (no-escape)\n===\n(\u256f\xb0\u25a1\xb0\uff09\u256f\ufe35 \u253b\u2501\u253b\n===\n(\u256f\xb0\u25a1\xb0\uff09\u256f\ufe35 \u253b\u2501\u253b\n===\n' ```

akosthekiss commented 5 years ago

PR rebased to latest master to make Travis CI go green.