google / yapf

A formatter for Python files
Apache License 2.0
13.76k stars 890 forks source link

Invalid unicode will crash yapf with an exception rather than an error message #51

Closed DRMacIver closed 9 years ago

DRMacIver commented 9 years ago

In the continuing strategy of "here's a python string literal for a file exhibiting this problem to avoid github being clever", the following string passes ast.parse (in python 3.4.2) but causes yapf to crash when a file with precisely these contents is passed to it and run under the same python version. I think this is because yapf has an assumption baked in that all source is valid utf-8.

String:

"# а\x91а\x96б\x9fб\x80б\x81б\x82б\x83б\x84б\x85б\x86б\x87б\x88б\x89б\x8aб\x8bб\x8cб\x8dб\x8eб\x8f <- Cyrillic characters\n'а\x8eб\x82т\x84\x96аЄ'\n"

Error:

INTERNAL ERROR: # а‘а–бŸб€бб‚бƒб„б
б†б‡бˆб‰бŠб‹бŒббŽб <- Cyrillic characters
Traceback (most recent call last):
  File "/home/david/yapf/yapf/yapflib/verifier.py", line 38, in VerifyCode
    compile(textwrap.dedent(code).encode('UTF-8'), '<string>', 'exec')
  File "<string>", line 2
    б†б‡бˆб‰бŠб‹бŒббŽб <- Cyrillic characters
                       ^
SyntaxError: invalid character in identifier

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/david/yapf/yapf/yapflib/verifier.py", line 41, in VerifyCode
    ast.parse(textwrap.dedent(code.lstrip('\n')).lstrip(), '<string>', 'exec')
  File "/home/david/.pyenv/versions/3.4.2/lib/python3.4/ast.py", line 35, in parse
    return compile(source, filename, mode, PyCF_ONLY_AST)
  File "<string>", line 2
    б†б‡бˆб‰бŠб‹бŒббŽб <- Cyrillic characters
                       ^
SyntaxError: invalid character in identifier

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/david/.pyenv/versions/3.4.2/lib/python3.4/runpy.py", line 170, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/david/.pyenv/versions/3.4.2/lib/python3.4/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/david/yapf/yapf/__main__.py", line 18, in <module>
    sys.exit(yapf.main(sys.argv))
  File "/home/david/yapf/yapf/__init__.py", line 102, in main
    print_diff=args.diff)
  File "/home/david/yapf/yapf/__init__.py", line 124, in FormatFiles
    filename, style_config=style_config, lines=lines, print_diff=print_diff)
  File "/home/david/yapf/yapf/yapflib/yapf_api.py", line 67, in FormatFile
    print_diff=print_diff)
  File "/home/david/yapf/yapf/yapflib/yapf_api.py", line 110, in FormatCode
    reformatted_source = reformatter.Reformat(uwlines)
  File "/home/david/yapf/yapf/yapflib/reformatter.py", line 73, in Reformat
    verifier.VerifyCode(formatted_code[-1])
  File "/home/david/yapf/yapf/yapflib/verifier.py", line 45, in VerifyCode
    compile(normalized_code.encode('UTF-8'), '<string>', 'exec')
  File "<string>", line 1
    б†б‡бˆб‰бŠб‹бŒббŽб <- Cyrillic characters
                       ^
SyntaxError: invalid character in identifier
bwendling commented 9 years ago

I think this was also fixed by 1d42a266d23394ff917d89aeab6f75a4d80e09c5.