Describe the bugocrmypdf crashes withTypeError: 'NoneType' object is not subscriptable`
To Reproduce
ocrmypdf 14.0.3.dev5+g9d5fa05a.d20230215
Running: ['tesseract', '--version']
Found tesseract 5.3.0-31-g9d71
Running: ['tesseract', '--version']
Running: ['gs', '--version']
Found gs 9.55.0
Running: ['gs', '--version']
Running: ['tesseract', '--list-langs']
stdout/stderr = List of available languages in "/usr/share/tesseract-ocr/5/tessdata/" (7):
chi_sim
deu
eng
fra
osd
por
spa
reading file from standard input
os.symlink(/tmp/ocrmypdf.io.yddbmk4e/stdin, /tmp/ocrmypdf.io.yddbmk4e/origin.pdf)
An exception occurred while executing the pipeline
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/ocrmypdf/_sync.py", line 378, in run_pipeline
pdfinfo = get_pdfinfo(
File "/usr/local/lib/python3.10/dist-packages/ocrmypdf/_pipeline.py", line 165, in get_pdfinfo
return PdfInfo(
File "/usr/local/lib/python3.10/dist-packages/ocrmypdf/pdfinfo/info.py", line 932, in __init__
self._pages = _pdf_pageinfo_concurrent(
File "/usr/local/lib/python3.10/dist-packages/ocrmypdf/pdfinfo/info.py", line 709, in _pdf_pageinfo_concurrent
executor(
File "/usr/local/lib/python3.10/dist-packages/ocrmypdf/_concurrent.py", line 87, in __call__
self._execute(
File "/usr/local/lib/python3.10/dist-packages/ocrmypdf/builtin_plugins/concurrency.py", line 141, in _execute
result = future.result()
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.10/dist-packages/ocrmypdf/pdfinfo/info.py", line 666, in _pdf_pageinfo_sync
page = PageInfo(pdf, pageno, infile, check_pages, detailed_analysis)
File "/usr/local/lib/python3.10/dist-packages/ocrmypdf/pdfinfo/info.py", line 746, in __init__
self._gather_pageinfo(pdf, pageno, infile, check_pages, detailed_analysis)
File "/usr/local/lib/python3.10/dist-packages/ocrmypdf/pdfinfo/info.py", line 792, in _gather_pageinfo
for info in _process_content_streams(
File "/usr/local/lib/python3.10/dist-packages/ocrmypdf/pdfinfo/info.py", line 594, in _process_content_streams
yield from _find_form_xobject_images(pdf, container, contentsinfo)
File "/usr/local/lib/python3.10/dist-packages/ocrmypdf/pdfinfo/info.py", line 528, in _find_form_xobject_images
if candidate['/Subtype'] != '/Form':
TypeError: 'NoneType' object is not subscriptable
Describe the bug
ocrmypdf crashes with
TypeError: 'NoneType' object is not subscriptable`To Reproduce
Example file file.zip
Expected behavior doesn't crash
System
ocrmypdf 14.0.3.dev5+g9d5fa05a.d20230215
pip
, or a Docker image? dockerfixed by https://github.com/ocrmypdf/OCRmyPDF/pull/1066