pymupdf / RAG

RAG (Retrieval-Augmented Generation) Chatbot Examples Using PyMuPDF
https://pymupdf.readthedocs.io/en/latest/pymupdf4llm
GNU Affero General Public License v3.0
303 stars 57 forks source link

'pymupdf4llm' has no attribute 'to_markdown' #17

Closed zzzcccxx closed 4 months ago

zzzcccxx commented 4 months ago

When I was running the code in the instance, an error occurred. The code I used is as follows:

import pymupdf4llm

md_text = pymupdf4llm.to_markdown("/mnt/workspace/DLA/pdf/test.pdf")

import pathlib
pathlib.Path("/mnt/workspace/DLA/output/output.md").write_bytes(md_text.encode())

The error message is as follows:

Traceback (most recent call last):
  File "/mnt/workspace/DLA/scripts/pymupdf.py", line 1, in <module>
    import pymupdf4llm
  File "/opt/conda/envs/pdf4llm/lib/python3.10/site-packages/pymupdf4llm/__init__.py", line 1, in <module>
    from .helpers.pymupdf_rag import to_markdown, IdentifyHeaders
  File "/opt/conda/envs/pdf4llm/lib/python3.10/site-packages/pymupdf4llm/helpers/pymupdf_rag.py", line 33, in <module>
    import pymupdf as fitz  # available with v1.24.3
  File "/mnt/workspace/DLA/scripts/pymupdf.py", line 3, in <module>
    md_text = pymupdf4llm.to_markdown("/mnt/workspace/DLA/pdf/test.pdf")
AttributeError: partially initialized module 'pymupdf4llm' has no attribute 'to_markdown' (most likely due to a circular import)

My pip list is as follows: Package Version


pdf4llm 0.0.9 pip 24.0 PyMuPDF 1.24.4 pymupdf4llm 0.0.3 PyMuPDFb 1.24.3 setuptools 69.5.1 wheel 0.43.0

JorjMcKie commented 4 months ago

Something does look wrong in your log: File "/opt/conda/envs/pdf4llm/lib/python3.10/site-packages/pymupdf4llm/__init__.py", line 1, in <module>

You seem to have a prior installation of pdf4llm?

Maybe best is to uninstall pdf4llm and pymupdf4llm and reinstall pymupdf4llm (only) using no cache. Here is a sample session:

python3 -m pip install -U pip
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: pip in /usr/lib/python3/dist-packages (22.0.2)
Collecting pip
  Downloading pip-24.0-py3-none-any.whl (2.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 1.4 MB/s eta 0:00:00
Installing collected packages: pip
Successfully installed pip-24.0

python3 -m pip install -U --no-cache-dir pymupdf4llm
Defaulting to user installation because normal site-packages is not writeable
Collecting pymupdf4llm
  Downloading pymupdf4llm-0.0.3-py3-none-any.whl.metadata (3.9 kB)
Requirement already satisfied: pymupdf>=1.24.2 in /home/harald/.local/lib/python3.10/site-packages (from pymupdf4llm) (1.24.4)
Requirement already satisfied: PyMuPDFb==1.24.3 in /home/harald/.local/lib/python3.10/site-packages (from pymupdf>=1.24.2->pymupdf4llm) (1.24.3)
Downloading pymupdf4llm-0.0.3-py3-none-any.whl (17 kB)
Installing collected packages: pymupdf4llm
Successfully installed pymupdf4llm-0.0.3

python3
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pathlib, pymupdf4llm
>>> data = pymupdf4llm.to_markdown("v110-changes.pdf")
>>> len(data)
8012
>>>
zzzcccxx commented 4 months ago

Something does look wrong in your log: File "/opt/conda/envs/pdf4llm/lib/python3.10/site-packages/pymupdf4llm/__init__.py", line 1, in <module>

You seem to have a prior installation of pdf4llm?

Maybe best is to uninstall pdf4llm and pymupdf4llm and reinstall pymupdf4llm (only) using no cache. Here is a sample session:

python3 -m pip install -U pip
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: pip in /usr/lib/python3/dist-packages (22.0.2)
Collecting pip
  Downloading pip-24.0-py3-none-any.whl (2.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 1.4 MB/s eta 0:00:00
Installing collected packages: pip
Successfully installed pip-24.0

python3 -m pip install -U --no-cache-dir pymupdf4llm
Defaulting to user installation because normal site-packages is not writeable
Collecting pymupdf4llm
  Downloading pymupdf4llm-0.0.3-py3-none-any.whl.metadata (3.9 kB)
Requirement already satisfied: pymupdf>=1.24.2 in /home/harald/.local/lib/python3.10/site-packages (from pymupdf4llm) (1.24.4)
Requirement already satisfied: PyMuPDFb==1.24.3 in /home/harald/.local/lib/python3.10/site-packages (from pymupdf>=1.24.2->pymupdf4llm) (1.24.3)
Downloading pymupdf4llm-0.0.3-py3-none-any.whl (17 kB)
Installing collected packages: pymupdf4llm
Successfully installed pymupdf4llm-0.0.3

python3
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pathlib, pymupdf4llm
>>> data = pymupdf4llm.to_markdown("v110-changes.pdf")
>>> len(data)
8012
>>>

Thank you, it works.