jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
MIT License
6.57k stars 659 forks source link

Difficulty extracting 'cells' from PDF without edges #493

Closed youpengbo2018 closed 3 years ago

youpengbo2018 commented 3 years ago

Discussed in https://github.com/jsvine/pdfplumber/discussions/379

Originally posted by **alexreg** March 18, 2021 [This](https://mathscinet.ams.org/msnhtml/serials.pdf) is the PDF I'm working with. It's proving rather troublesome to even get any table cells extracted from it. I've tried different values for `horizontal_strategy` and `vertical_strategy` to no avail.