jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
MIT License
6.57k stars 659 forks source link

Error #636

Closed Puneet0353 closed 2 years ago

Puneet0353 commented 2 years ago

Describe the bug

A clear and concise description of what the bug is. ValueError: not enough values to unpack (expected 2, got 1)

The complete details of the error are-

ValueError Traceback (most recent call last) ~\AppData\Local\Temp/ipykernel_14976/3193579313.py in 5 #Get Basic Data and convert them to Dictionary 6 page = pdf.pages[1] ----> 7 Page1_Tables = page.extract_tables() 8 input(Page1_Tables) 9 B1 = pd.DataFrame(Page1_Tables[0])

~\anaconda3\lib\site-packages\pdfplumber\page.py in extract_tables(self, table_settings) 223 def extract_tables(self, table_settings={}): 224 table_settings = TableFinder.resolve_table_settings(table_settings) --> 225 tables = self.find_tables(table_settings) 226 227 extract_kwargs = dict(

~\anaconda3\lib\site-packages\pdfplumber\page.py in find_tables(self, table_settings) 219 220 def find_tables(self, table_settings={}): --> 221 return TableFinder(self, table_settings).tables 222 223 def extract_tables(self, table_settings={}):

~\anaconda3\lib\site-packages\pdfplumber\table.py in init(self, page, settings) 472 self.page = page 473 self.settings = self.resolve_table_settings(settings) --> 474 self.edges = self.get_edges() 475 self.intersections = edges_to_intersections( 476 self.edges,

~\anaconda3\lib\site-packages\pdfplumber\table.py in get_edges(self) 568 569 if v_strat == "lines": --> 570 v_base = utils.filter_edges(self.page.edges, "v") 571 elif v_strat == "lines_strict": 572 v_base = utils.filter_edges(self.page.edges, "v", edge_type="line")

~\anaconda3\lib\site-packages\pdfplumber\container.py in edges(self) 77 if hasattr(self, "_edges"): 78 return self._edges ---> 79 line_edges = list(map(utils.line_to_edge, self.lines)) 80 self._edges = self.rect_edges + line_edges 81 return self._edges

~\anaconda3\lib\site-packages\pdfplumber\container.py in lines(self) 35 @property 36 def lines(self): ---> 37 return self.objects.get("line", []) 38 39 @property

~\anaconda3\lib\site-packages\pdfplumber\page.py in objects(self) 150 if hasattr(self, "_objects"): 151 return self._objects --> 152 self._objects = self.parse_objects() 153 return self._objects 154

~\anaconda3\lib\site-packages\pdfplumber\page.py in parse_objects(self) 206 def parse_objects(self): 207 objects = {} --> 208 for obj in self.iter_layout_objects(self.layout._objs): 209 kind = obj["object_type"] 210 if kind in ["anno"]:

~\anaconda3\lib\site-packages\pdfplumber\page.py in layout(self) 96 ) 97 interpreter = PDFPageInterpreter(self.pdf.rsrcmgr, device) ---> 98 interpreter.process_page(self.page_obj) 99 self._layout = device.get_result() 100 return self._layout

~\anaconda3\lib\site-packages\pdfminer\pdfinterp.py in process_page(self, page) 1003 ctm = (1, 0, 0, 1, -x0, -y0) 1004 self.device.begin_page(page, ctm) -> 1005 self.render_contents(page.resources, page.contents, ctm=ctm) 1006 self.device.end_page(page) 1007 return

~\anaconda3\lib\site-packages\pdfminer\pdfinterp.py in render_contents(self, resources, streams, ctm) 1021 self.init_resources(resources) 1022 self.init_state(ctm) -> 1023 self.execute(list_value(streams)) 1024 return 1025

~\anaconda3\lib\site-packages\pdfminer\pdfinterp.py in execute(self, streams) 1049 else: 1050 log.debug('exec: %s', name) -> 1051 func() 1052 else: 1053 if settings.STRICT:

~\anaconda3\lib\site-packages\pdfminer\pdfinterp.py in do_s(self) 584 """Close and stroke path""" 585 self.do_h() --> 586 self.do_S() 587 return 588

~\anaconda3\lib\site-packages\pdfminer\pdfinterp.py in do_S(self) 576 def do_S(self) -> None: 577 """Stroke path""" --> 578 self.device.paint_path(self.graphicstate, True, False, False, 579 self.curpath) 580 self.curpath = []

~\anaconda3\lib\site-packages\pdfminer\converter.py in paint_path(self, gstate, stroke, fill, evenodd, path) 119 raw_pts = [cast(Point, p[-2:] if p[0] != 'h' else path[0][-2:]) 120 for p in path] --> 121 pts = [apply_matrix_pt(self.ctm, pt) for pt in raw_pts] 122 123 if shape in {'mlh', 'ml'}:

~\anaconda3\lib\site-packages\pdfminer\converter.py in (.0) 119 raw_pts = [cast(Point, p[-2:] if p[0] != 'h' else path[0][-2:]) 120 for p in path] --> 121 pts = [apply_matrix_pt(self.ctm, pt) for pt in raw_pts] 122 123 if shape in {'mlh', 'ml'}:

~\anaconda3\lib\site-packages\pdfminer\utils.py in apply_matrix_pt(m, v) 251 def apply_matrix_pt(m: Matrix, v: Point) -> Point: 252 (a, b, c, d, e, f) = m --> 253 (x, y) = v 254 """Applies a matrix to a point.""" 255 return a x + c y + e, b x + d y + f

ValueError: not enough values to unpack (expected 2, got 1)

Code to reproduce the problem

import pdfplumber FILE = "D:\Astro\Charts\"+Name+".pdf" pdf = pdfplumber.open(FILE)

Get Basic Data and convert them to Dictionary

page = pdf.pages[1] Page1_Tables = page.extract_tables() Paste it here, or attach a Python file.

PDF file

Please attach any PDFs necessary to reproduce the problem.

If you need to redact text in a sensitive PDF, you can run it through JoshData/pdf-redactor.

Expected behavior

What did you expect the result should have been? It should have extracted tables. It was working fine. However after I reinstalled Anaconda with Python 3.9, this problem has started coming

Actual behavior

What actually happened, instead?

Screenshots

If applicable, add screenshots to help explain your problem.

Environment

Additional context

Ajay VR Detailed.pdf

Add any other context/notes about the problem here.

jsvine commented 2 years ago

Hi @Puneet0353, and thanks for sharing this interesting example. I have examined the file and the error, and have come to the following conclusions:

 gs \
  -o "Ajay VR Detailed-repaired.pdf" \
  -sDEVICE=pdfwrite \
  -dPDFSETTINGS=/prepress \
  "Ajay VR Detailed.pdf"

I hope that helps. In the meantime, I plan to investigate whether there's a way to improve pdfminer.six's handling of the graphics command in your PDF, and will submit a PR on that repository if I find a solution.

Puneet0353 commented 2 years ago

Thanks so much. But by installing the previous version of pdfplumber resolved the issue.

On Mon, 11 Apr, 2022, 18:49 Jeremy Singer-Vine, @.***> wrote:

Hi @Puneet0353 https://github.com/Puneet0353, and thanks for sharing this interesting example. I have examined the file and the error, and have come to the following conclusions:

-

Per the traceback you've pasted above (and which I've confirmed), the error is raised by pdfminer.six https://github.com/pdfminer/pdfminer.six, the library we use to extract the raw object information from the PDFs. So this isn't an issue that cannot be resolved directly through pdfplumber.

pdfminer.six appears to raise the error due to an unusual graphics command in the PDF. I'm not entirely sure whether the PDF is malformed or whether it's just unusual. In either case, the PDF appears to parse cleanly if you first repair it with GhostScript https://superuser.com/questions/278562/how-can-i-fix-repair-a-corrupted-pdf-file :

gs \ -o "Ajay VR Detailed-repaired.pdf" \ -sDEVICE=pdfwrite \ -dPDFSETTINGS=/prepress \ "Ajay VR Detailed.pdf"

I hope that helps. In the meantime, I plan to investigate whether there's a way to improve pdfminer.six's handling of the graphics command in your PDF, and will submit a PR on that repository if I find a solution.

— Reply to this email directly, view it on GitHub https://github.com/jsvine/pdfplumber/issues/636#issuecomment-1095042549, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHHQULORGN4UVBE3K3KLUCDVEQRHHANCNFSM5SRLWEVA . You are receiving this because you were mentioned.Message ID: @.***>