Closed bpugnaire closed 2 years ago
To be noted, I only encountered the bug when using Regex = True
Hi @bpugnaire Appreciate your interest in the library and thanks for raising this issue. Could you please provide the sample code that you used to reproduce this issue? It would help us investigate and fix. If possible, the PDF that resulted in this behaviour as well.
match_group = '(Figure|Fig.|fig.|Tab.|tab.|Tabl.)' search_result = page.search("(?<!((|[))"+match_group, regex=True)
The regex is here to capture "xxx Figure 1xxx" and not "xxx [Figure1 xxx"
Thanks @bpugnaire. Are you able to share the PDF? That would make it easier to diagnose the situation.
This is fixed in the above commit, and available as of v0.7.1.
Describe the bug
When using the page.search method with a regex, the program may raise a ValueError: min() arg is an empty sequence.
Code to reproduce the problem
The issue is not easily reproducible, on some pages with the same Regex pattern I get the expected behavior (list of match or an empty list if no match). And sometimes I get the error
Expected behavior
Get a list of match or an empty list if there are no matches
Actual behavior
Got the ValueError: min() arg is an empty sequence
Environment
Additional context
Traceback (most recent call last):
File "c:\Users\xxx.virtualenvs\project\lib\site-packages\pdfplumber\page.py", line 316, in search return text_layout.search(pattern, regex=regex, case=case) File "c:\Users\xxx.virtualenvs\project\lib\site-packages\pdfplumber\utils.py", line 528, in search return list(map(match_to_dict, gen)) File "c:\Users\xxx.virtualenvs\project\lib\site-packages\pdfplumber\utils.py", line 499, in match_to_dict x0, top, x1, bottom = objects_to_bbox(chars) File "c:\Users\xx.virtualenvs\project\lib\site-packages\pdfplumber\utils.py", line 207, in objects_to_bbox min(map(itemgetter("x0"), objects)), ValueError: min() arg is an empty sequence