jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
MIT License
6.1k stars 625 forks source link

Consider making `wand` optional to avoid AGPL #832

Closed mjbommar closed 6 months ago

mjbommar commented 1 year ago

Issue

Due to the use of wand, this package effectively requires that projects accept AGPL via transitive dependencies. However, only some pdfplumber functionality actually uses wand today.

Proposal

  1. Over future releases, move wand to an extra/optional group.
  2. In methods that require it, dynamically import wand and warn the user at runtime if not installed.

Testing

I can confirm that much of the package functionality works without issue if wand is manually removed and imagemagick/ghostscript are not available.

jsvine commented 1 year ago

Thank you for your note, @mjbommar. I'm not terribly familiar with the particularities of licensing, so I could use your help understanding the situation. Is your issue that:

a. pip install pdfplumber installs wand on your computer?

b. import pdfplumber also imports wand behind the scenes?

c. Both?

d. Something else?

Also, would you be able to point me to other Python libraries that handle a similar situation to your liking?

And any reference material I could read to make sure I understand the concerns?

mjbommar commented 1 year ago

Hi @jsvine, Sorry for the slow response!

The root issue is that ghostcript is AGPL.

ghostscript --> ImageMagick --> wand --> pdfplumber
    AGPL          Apache-2       MIT        MIT       

Most distributions of ImageMagick also require or automatically install ghostscript.

It's been a few weeks, but if I recall, the functionality in wand that you're using also relies on ghostscript.

The easiest solution is to move the visual debugging functionality that relies on gs to an extras block. Users who wanted this functionality and were OK with gs/AGPL could install with pip install pdfplumber[visual] or [wand], but not otherwise. You could catch ImportError on import wand and print a notice to the user about ghostscript and the extra pip installation [visual]

jsvine commented 1 year ago

Thanks for the follow-up, @mjbommar! But, if I'm understanding the concerns correctly, I think the issue may be moot, as installing wand via pip does not actually install ImageMagick or ghostscript — users have to install those separately: https://docs.wand-py.org/en/0.6.7/guide/install.html

Screen Shot 2023-03-30 at 12 43 33 PM

Does that change your opinion on including wand among pdfplumber's automatic dependencies?

mjbommar commented 1 year ago

Right. Some automated compliance tools, however, flag ImageMagick, as the package typically installs ghostscript automatically.

See, for example, Ubuntu 22.04, which will automatically install ghostscript if the user installs imagemagick without the --no-install-recommends flag.

It's easy enough for a sophisticated user to remove wand and clean their compliance scan, and it doesn't affect normal use, so it's definitely an issue that doesn't require a change from your perspective.

Package: imagemagick-6.q16
Version: 8:6.9.11.60+dfsg-1.3ubuntu0.22.04.2
Priority: optional
Section: universe/graphics
Source: imagemagick
Origin: Ubuntu
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Original-Maintainer: ImageMagick Packaging Team <pkg-gmagick-im-team@lists.alioth.debian.org>
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Installed-Size: 568 kB
Provides: imagemagick, imagemagick-6.defaultquantum
Depends: libc6 (>= 2.34), libmagickcore-6.q16-6 (>= 8:6.9.10.2), libmagickwand-6.q16-6 (>= 8:6.9.10.2), hicolor-icon-theme
Recommends: libmagickcore-6.q16-6-extra, ghostscript, netpbm
...
jsvine commented 1 year ago

Hmm, I'm still a bit confused. If wand doesn't actually install imagemagick or ghostscript (which is my current understanding), why would a user need/want to remove it?

filips123 commented 1 year ago

Although licensing is probably already fine, it might still be useful to move Pillow and Wand into extras block, so users who don't need those functionalities wouldn't need to install unnecessary dependencies.

jsvine commented 6 months ago

Belatedly noting that, as of version 0.10.0, pdfplumber no longer uses wand, but instead pypdfium2: https://github.com/jsvine/pdfplumber/blob/stable/CHANGELOG.md#0100---2023-07-16