py-pdf / pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
https://pypdf.readthedocs.io/en/latest/
Other
8.07k stars 1.39k forks source link

PyPDF2 does not import #1480

Closed mmr-crexi closed 1 year ago

mmr-crexi commented 1 year ago

I am trying to import PyPDF2 into a python notebook.

Environment

Which environment were you using when you encountered the problem?

Python 3.10, with this requirements file:

snowflake-connector-python[pandas,secure-local-storage]
python-dotenv
jupyter
pandas-profiling
openpyxl
pytesseract
transformers
PyPDF2[full]

using a venv created by this setup script:

#!/bin/bash

# this script assumes that you have pip3 installed
# https://linuxconfig.org/install-pip-on-linux
# https://phoenixnap.com/kb/install-pip-mac
# https://www.geeksforgeeks.org/how-to-install-pip-on-windows/
# there's too much variability there to be nice for a hackathon project
# also assumes you have virtualenv installed
# like sudo apt install python3-virtualenv
# or from brew, or whatever Windows uses

virtualenv venv
source venv/bin/activate
pip3 install -r requirements.txt --timeout 30000

# once you've run this, then you should type 
# source venv/bin/activate
# to get into the venv

Code + PDF

This is a minimal, complete example that shows the issue:


!pip freeze | grep PyP

from PyPDF2 import PdfReader

yields

PyPDF2==2.11.2

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In [14], line 5
      1 import pandas as pd
      3 get_ipython().system('pip freeze | grep PyP')
----> 5 from PyPDF2 import PdfReader
      6 import os

ModuleNotFoundError: No module named 'PyPDF2'

My python version:


Python 3.10.6

So pip freeze shows that the library is found and available, but not being imported, for whatever reason.

My initial thinking is that the naming of the module is not compliant with PEP8:

https://peps.python.org/pep-0008/#package-and-module-names

Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability. Python packages should also have short, all-lowercase names, although the use of underscores is discouraged.

MartinThoma commented 1 year ago

I doubt that this is a PyPDF2 issue, but people seem to stumble over this a lot: https://stackoverflow.com/q/69322531/562769

MartinThoma commented 1 year ago

Could you please try executing

%pip install PyPDF2

In a cell of the same notebook?

MartinThoma commented 1 year ago

See https://discuss.python.org/t/difficulty-in-installing-pypdf2-and-pcmupdf/17013/7

mmr-crexi commented 1 year ago

Well, given that I've not encountered it in any other library, I'm pretty tempted to say it's a pypdf2 issue. Why would I need to install via apt, if this is a pure python library? Shouldn't pip be sufficient?

mmr-crexi commented 1 year ago

When I try to install in the same notebook:

!pip install PyPDF2[full]

Requirement already satisfied: PyPDF2[full] in ./venv/lib/python3.10/site-packages (2.11.2)
Requirement already satisfied: PyCryptodome in ./venv/lib/python3.10/site-packages (from PyPDF2[full]) (3.16.0)
Requirement already satisfied: Pillow in ./venv/lib/python3.10/site-packages (from PyPDF2[full]) (9.3.0)

Which is to be expected, since pip freeze indicated that PyPDF2 was already installed.

MartinThoma commented 1 year ago

Very weird. The capitalization should not be important, but when we move PyPDF2 back to pypdf, I'll make it all lowercase

mmr-crexi commented 1 year ago

Could there be some expectation that it be installed on the system directly? Perhaps a hardcoded path or something like that? (If that's the case, I would definitely want to know, because it would give a few extra steps to deploying to production).

MartinThoma commented 1 year ago

Perhaps a hardcoded path or something like that?

No. The package is pretty standard from a packaging perspective

MartinThoma commented 1 year ago

Works on colab: https://colab.research.google.com/drive/11SrYNl-4lcY_aAFu0GM0O0FxdS2thq-S#scrollTo=av_pOz4f3L1G

mmr-crexi commented 1 year ago

I don't have access to the collab, but I'll take your word for it :)

I'm curious as to why installing via apt would affect the solution, if I'm using a virtualenv that should also have everything in it, as per the above diagnostics.

MartinThoma commented 1 year ago

why installing via apt would affect the solution

Why do you think it would?

However, I'd recommend never to use apt for PyPDF2. It's so much out of date

mmr-crexi commented 1 year ago

why installing via apt would affect the solution

Why do you think it would?

However, I'd recommend never to use apt for PyPDF2. It's so much out of date

I only looked at it because of this StackOverflow link:https://stackoverflow.com/q/69322531/562769, which leads to https://zoomadmin.com/HowToInstall/UbuntuPackage/python-pypdf2, which talked about installing it via apt.

I think this was a problem in my python environment.

I switched to using pyenv and poetry instead of using pip and my system installation, and am using 3.10.9 instead of 3.10.6 that's on my system. I can now import PyPDF2 without issues.