jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
MIT License
6.57k stars 659 forks source link

upgrade the version of pdfminer.six #315

Closed bimmlerd closed 2 years ago

bimmlerd commented 3 years ago

In the spirit of #224 - pdfminer.six has has released version '20201018'

I'm trying to package paper2remarkable for Arch, which depends on this (which I'll thus also package :smiley: ). Since Arch packages typcically package the lastest release, it would make my life a bit easier if this didn't depend explicitly on an older version.

Let me know if I can help with the upgrade and thanks for the work!

jsvine commented 3 years ago

Yes, definitely aiming to upgrade the pinned version of pdfminer.six, but the current version breaks some tests due to changes in how curves are processed. Working on sorting that out.

Spiritus44 commented 3 years ago

Hello,

I am in the same trouble as @bimmlerd . It would be great to update the pdfminer.six to the last version (currently 20201018). In my case it's the pdftitle program which depends on a more recent version of pdfminer.six...

@jsvine Do you have progressed on this point ?

Thank you for your work !

jsvine commented 3 years ago

Hi @Spiritus44. I'm still waiting for a pdfminer.six PR to be merged. In the meantime, however, you can manually upgrade the library to the latest version via pip install pdfminer.six==20201018.

emccords commented 3 years ago

Hi @jsvine -- I'm in a similar boat as some others in that I am trying to use pdfplumber and the latest version of camelot-py together, and camelot wants pdfminer.six>=20200720. Can any further progress on this issue be made (not sure if the pdfminer.six PR you mentioned above has been merged). I'm not sure exactly how to do the manual workaround you suggest above, as when I try to install the latest version of pdfminer.six I get a SolverProblemError (For context, I'm using poetry for dependency management, and this error is from running the poetry add pdfminer.six=20201018 command.).

SolverProblemError

  Because no versions of pdfplumber match >0.5.27,<0.5.28 || >0.5.28,<0.6.0
   and pdfplumber (0.5.27) depends on pdfminer.six (20200517), pdfplumber (>=0.5.27,<0.5.28 || >0.5.28,<0.6.0) requires pdfminer.six (20200517).
  And because pdfplumber (0.5.28) depends on pdfminer.six (20200517), pdfplumber (>=0.5.27,<0.6.0) requires pdfminer.six (20200517).
  So, because table-extraction depends on both pdfplumber (^0.5.27) and pdfminer.six (20201018), version solving failed.
jsvine commented 3 years ago

Hi @emccords and thanks for your interest in this library. Unfortunately for your use-case, this will likely be a perpetual issue. Even if pdfminer.six does ultimately accept the necessary PR, it's important for the reliability of pdfplumber to pin a specific version of pdfminer.six — and, from time to time, that version will be different than what camelot-py requires. Although I see the desirability of using both pdfplumber and camelot-py in the same project, I don't think either project can guarantee to remain in pdfminer.six-version lockstep.

That said, pdfplumber will work as intended with most versions of pdfminer.six, in most situations. So feel free to manually upgrade pdfminer.six to whatever version you desire. I don't know, however, the best way to do that for your particular poetry setup.

emccords commented 3 years ago

@jsvine - Thanks for the response, understood. I appreciate the guidance.

Rachnas commented 3 years ago

I faced the same issue, resolved it by using camelot-py==0.8.0 Pdfplumber == 0.5.24 Pip version 21.2.4

jsvine commented 2 years ago

The latest release of pdfplumber (0.6.0) now ships with the currently-latest version of pdfminer.six (20211012).