lmmx / impscan

Command line tool to identify minimal imports list and repository sources by parsing package dependency trees
MIT License
0 stars 1 forks source link

Encoding must be specified to decode #5

Closed lmmx closed 3 years ago

lmmx commented 3 years ago

If a file with a BOM (U+FEFF) Unicode character is not encoded with encoding="utf-8-sig" then the ast.parse call on it will fail.

>>> from pathlib import Path
>>> import ast
>>> p = Path("conf.py")
>>> ast.parse(p.read_text())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/louis/miniconda3/envs/impscan/lib/python3.9/ast.py", line 50, in parse
    return compile(source, filename, mode, flags,
  File "<unknown>", line 1
    # -*- coding: utf-8 -*-
    ^
SyntaxError: invalid non-printable character U+FEFF
>>> ast.parse(p.read_text(encoding="utf-8-sig"))
<ast.Module object at 0x7fb82fb03f10>