py-pdf / pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
https://pypdf.readthedocs.io/en/latest/
Other
7.73k stars 1.36k forks source link

local variable 'cm' referenced before assignment #2702

Closed thelazydogsback closed 2 weeks ago

thelazydogsback commented 1 month ago

Trying to extract text from page. Tested in Win11 & Linux container. pypdf==4.2.0, crypt_provider=('cryptography', '42.0.5'), PIL=none

Traceback

File "/usr/local/lib/python3.10/site-packages/pypdf/_page.py", line 2052, in extract_text
    return self._layout_mode_text(
  File "/usr/local/lib/python3.10/site-packages/pypdf/_page.py", line 1950, in _layout_mode_text
    fonts = self._layout_mode_fonts()
  File "/usr/local/lib/python3.10/site-packages/pypdf/_page.py", line 1902, in _layout_mode_fonts
    *cmap, font_dict_obj = build_char_map(font_name, 200.0, self)
  File "/usr/local/lib/python3.10/site-packages/pypdf/_cmap.py", line 33, in build_char_map
    font_subtype, font_halfspace, font_encoding, font_map = build_char_map_from_dict(
  File "/usr/local/lib/python3.10/site-packages/pypdf/_cmap.py", line 58, in build_char_map_from_dict
    map_dict, space_code, int_entry = parse_to_unicode(ft, space_code)
  File "/usr/local/lib/python3.10/site-packages/pypdf/_cmap.py", line 235, in parse_to_unicode
    cm = prepare_cm(ft)
  File "/usr/local/lib/python3.10/site-packages/pypdf/_cmap.py", line 260, in prepare_cm
    if isinstance(cm, str):
UnboundLocalError: local variable 'cm' referenced before assignment
pubpub-zz commented 1 month ago

please provide code and input file

pubpub-zz commented 3 weeks ago

@thelazydogsback please update the issue with code and input file, else we will have to close the issue as "can't reproduce"

pubpub-zz commented 2 weeks ago

@thelazydogsback Please update the issue with code and input file. Else, the issue will be closed as can not be reproduced

pubpub-zz commented 2 weeks ago

I close this dead issue