pdoc3 / pdoc

:snake: :arrow_right: :scroll: Auto-generate API documentation for Python projects
https://pdoc3.github.io/pdoc/
GNU Affero General Public License v3.0
1.14k stars 146 forks source link

"Soft" handling for UnicodeDecodeError #396

Closed frank101010 closed 8 months ago

frank101010 commented 2 years ago

During a run of pdoc3, I was confronted with a python file with a comment containing non-utf8 characters. While this shouldn't have happened, pdoc3's current behavior is to raise an exception without giving any clue about which file being the culprit:

Traceback (most recent call last):
  File "...\pdoc3\pdoc\__main__.py", line 6, in <module>
    main()
  File "...\pdoc3\pdoc\cli.py", line 534, in main
    modules = [pdoc.Module(module, docfilter=docfilter,
  File "...\pdoc3\pdoc\cli.py", line 534, in <listcomp>
    modules = [pdoc.Module(module, docfilter=docfilter,
  File "...\pdoc3\pdoc\__init__.py", line 754, in __init__
    m = Module(import_module(fullname),
  File "...\pdoc3\pdoc\__init__.py", line 675, in __init__
    var_docstrings, _ = _pep224_docstrings(self)
  File "...\pdoc3\pdoc\__init__.py", line 269, in _pep224_docstrings
    _ = inspect.findsource(doc_obj.obj)
  File "C:\Python39\lib\inspect.py", line 831, in findsource
    lines = linecache.getlines(file, module.__dict__)
  File "C:\Python39\lib\linecache.py", line 46, in getlines
    return updatecache(filename, module_globals)
  File "C:\Python39\lib\linecache.py", line 137, in updatecache
    lines = fp.readlines()
  File "C:\Python39\lib\codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 2403: invalid continuation byte

Adding UnicodeDecodeError to the list of excepted exceptions in _pep224_docstrings() will change the output to this, which includes the name of module causing the problem:

...\pdoc3\pdoc\__init__.py:754: UserWarning: Couldn't read PEP-224 variable docstrings from <Module 'unifiedmodel.enum'>: 'utf-8' codec can't decode byte 0xe4 in position 2403: invalid continuation byte
  m = Module(import_module(fullname),
...\pdoc3\pdoc\__init__.py:754: UserWarning: Couldn't read PEP-224 variable docstrings from <Class 'unifiedmodel.enum.Entry'>: 'utf-8' codec can't decode byte 0xe4 in position 2403: invalid continuation byte
  m = Module(import_module(fullname),
...\pdoc3\pdoc\__init__.py:754: UserWarning: Couldn't read PEP-224 variable docstrings from <Class 'unifiedmodel.enum.Enum'>: 'utf-8' codec can't decode byte 0xe4 in position 2403: invalid continuation byte
  m = Module(import_module(fullname),