I was trying to parse a pdf paper. When I extract a text from it, I got the error: UnboundLocalError: cannot access local variable 'v' where it is not associated with a value
Traceback (most recent call last):
File "/Users/somen/.pyenv/versions/3.12.2/lib/python3.12/runpy.py", line 198, in _run_module_as_main
return _run_code(code, main_globals, None,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/somen/.pyenv/versions/3.12.2/lib/python3.12/runpy.py", line 88, in _run_code
exec(code, run_globals)
File "/Users/somen/.vscode/extensions/ms-python.debugpy-2024.12.0-darwin-arm64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 71, in <module>
cli.main()
File "/Users/somen/.vscode/extensions/ms-python.debugpy-2024.12.0-darwin-arm64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 501, in main
run()
File "/Users/somen/.vscode/extensions/ms-python.debugpy-2024.12.0-darwin-arm64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 351, in run_file
runpy.run_path(target, run_name="__main__")
File "/Users/somen/.vscode/extensions/ms-python.debugpy-2024.12.0-darwin-arm64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 310, in run_path
return _run_module_code(code, init_globals, run_name, pkg_name=pkg_name, script_name=fname)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/somen/.vscode/extensions/ms-python.debugpy-2024.12.0-darwin-arm64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 127, in _run_module_code
_run_code(code, mod_globals, init_globals, mod_name, mod_spec, pkg_name, script_name)
File "/Users/somen/.vscode/extensions/ms-python.debugpy-2024.12.0-darwin-arm64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 118, in _run_code
exec(code, run_globals)
File "/Users/somen/Zavodi/unik/llm4codesec_literature_review/test.py", line 6, in <module>
print(page.extract_text())
^^^^^^^^^^^^^^^^^^^
File "/Users/somen/Zavodi/unik/llm4codesec_literature_review/.venv/lib/python3.12/site-packages/pypdf/_page.py", line 2393, in extract_text
return self._extract_text(
^^^^^^^^^^^^^^^^^^^
File "/Users/somen/Zavodi/unik/llm4codesec_literature_review/.venv/lib/python3.12/site-packages/pypdf/_page.py", line 1868, in _extract_text
cmaps[f] = build_char_map(f, space_width, obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/somen/Zavodi/unik/llm4codesec_literature_review/.venv/lib/python3.12/site-packages/pypdf/_cmap.py", line 34, in build_char_map
font_subtype, font_halfspace, font_encoding, font_map = build_char_map_from_dict(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/somen/Zavodi/unik/llm4codesec_literature_review/.venv/lib/python3.12/site-packages/pypdf/_cmap.py", line 57, in build_char_map_from_dict
encoding, map_dict = get_encoding(ft)
^^^^^^^^^^^^^^^^
File "/Users/somen/Zavodi/unik/llm4codesec_literature_review/.venv/lib/python3.12/site-packages/pypdf/_cmap.py", line 130, in get_encoding
map_dict, int_entry = _parse_to_unicode(ft)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/somen/Zavodi/unik/llm4codesec_literature_review/.venv/lib/python3.12/site-packages/pypdf/_cmap.py", line 213, in _parse_to_unicode
return _type1_alternative(ft, map_dict, int_entry)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/somen/Zavodi/unik/llm4codesec_literature_review/.venv/lib/python3.12/site-packages/pypdf/_cmap.py", line 531, in _type1_alternative
map_dict[chr(i)] = v
^
UnboundLocalError: cannot access local variable 'v' where it is not associated with a value
I was trying to parse a pdf paper. When I extract a text from it, I got the error:
UnboundLocalError: cannot access local variable 'v' where it is not associated with a value
Environment
ARM MacOS 15.1, Python 3.12.2, pypdf == 5.1.0
Code + PDF
This is a minimal, complete example that shows the issue:
2305.09315.pdf
Traceback
This is the complete traceback I see: