Too many edges detected

jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

MIT License

6.57k stars 659 forks source link

Describe the bug

Hi, Thanks for your amazing projects.

I tried to extract tables from pdf files. But some pdf files are detected too many edges from page(above 100,000)

def curves_to_edges(cs): edges = [] for c in cs: edges += pdfplumber.utils.rect_to_edges(c) return edges

lines = curves_to_edges(page.curves + page.edges)

one thick line can be detected as one line

one line is detected like many lines.

If applicable, add screenshots to help explain your problem.

Add any other context/notes about the problem here.