opendatalab / MinerU

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
https://opendatalab.com/OpenSourceTools?tool=extract
GNU Affero General Public License v3.0
19.71k stars 1.4k forks source link

ModuleNotFoundError: No module named 'mupdf' #1145

Open vincent507cpu opened 2 days ago

vincent507cpu commented 2 days ago

Description of the bug | 错误描述

使用官方样例报 ModuleNotFoundError: No module named 'mupdf'。尝试安装 mupdf,提示已安装。尝试安装 mupdf,报错。

How to reproduce the bug | 如何复现

import os

from magic_pdf.data.data_reader_writer import FileBasedDataWriter, FileBasedDataReader
from magic_pdf.config.make_content_config import DropMode, MakeMode
from magic_pdf.pipe.OCRPipe import OCRPipe

## args
model_list = []
pdf_file_name = "/Users/wenjiazhai/Documents/GitHub/paper_analyze/data/LightRAG Simple and Fast Retrieval-Augmented Generation.pdf"  # replace with the real pdf path

## prepare env
local_image_dir, local_md_dir = "data/images", "data"
os.makedirs(local_image_dir, exist_ok=True)

image_writer, md_writer = FileBasedDataWriter(local_image_dir), FileBasedDataWriter(
    local_md_dir
)
image_dir = str(os.path.basename(local_image_dir))

reader1 = FileBasedDataReader("")
pdf_bytes = reader1.read(pdf_file_name)   # read the pdf content

pipe = OCRPipe(pdf_bytes, model_list, image_writer)

pipe.pipe_classify()
pipe.pipe_analyze()
pipe.pipe_parse()

pdf_info = pipe.pdf_mid_data["pdf_info"]

md_content = pipe.pipe_mk_markdown(
    image_dir, drop_mode=DropMode.NONE, md_make_mode=MakeMode.MM_MD
)

if isinstance(md_content, list):
    md_writer.write_string(f"{pdf_file_name}.md", "\n".join(md_content))
else:
    md_writer.write_string(f"{pdf_file_name}.md", md_content)
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
~/.local/lib/python3.10/site-packages/pymupdf/__init__.py in ?()
    360         from . import mupdf
    361     except Exception:
--> 362         import mupdf
    363     mupdf.reinit_singlethreaded()

~/.local/lib/python3.10/site-packages/pymupdf/mupdf.py in ?()
      8 # Import the low-level C/C++ module
      9 if __package__ or "." in __name__:
---> 10     from . import _mupdf
     11 else:

ImportError: dlopen(/Users/wenjiazhai/.local/lib/python3.10/site-packages/pymupdf/_mupdf.so, 0x0002): Symbol not found: __ZN5mupdf10FzDocumentcvbEv
  Referenced from: <8754673F-F744-3CC8-B209-9088FA8B0E50> /Users/wenjiazhai/.local/lib/python3.10/site-packages/pymupdf/_mupdf.so
  Expected in:     <0A4F4541-8671-3332-BDCA-3946ABA2976B> /Users/wenjiazhai/.local/lib/python3.10/site-packages/pymupdf/libmupdfcpp.so

During handling of the above exception, another exception occurred:

ModuleNotFoundError                       Traceback (most recent call last)
/var/folders/77/tngzlz3n44s1cm7spj20fy8m0000gn/T/ipykernel_65425/12275301.py in ?()
      1 import os
      2 
      3 from magic_pdf.data.data_reader_writer import FileBasedDataWriter, FileBasedDataReader
      4 from magic_pdf.config.make_content_config import DropMode, MakeMode
----> 5 from magic_pdf.pipe.OCRPipe import OCRPipe
      6 
      7 
      8 ## args

~/miniconda3/envs/langchain2/lib/python3.10/site-packages/magic_pdf/pipe/OCRPipe.py in ?()
      1 from loguru import logger
      2 
      3 from magic_pdf.config.make_content_config import DropMode, MakeMode
      4 from magic_pdf.data.data_reader_writer import DataWriter
----> 5 from magic_pdf.model.doc_analyze_by_custom_model import doc_analyze
      6 from magic_pdf.pipe.AbsPipe import AbsPipe
      7 from magic_pdf.user_api import parse_ocr_pdf
      8 

~/miniconda3/envs/langchain2/lib/python3.10/site-packages/magic_pdf/model/doc_analyze_by_custom_model.py in ?()
      1 import time
      2 
----> 3 import fitz
      4 import numpy as np
      5 from loguru import logger
      6 

~/.local/lib/python3.10/site-packages/fitz/__init__.py in ?()
      1 # pylint: disable=wildcard-import,unused-import,unused-wildcard-import
----> 2 from pymupdf import *
      3 from pymupdf import _as_fz_document
      4 from pymupdf import _as_fz_page
      5 from pymupdf import _as_pdf_document

      ~/.local/lib/python3.10/site-packages/pymupdf/__init__.py in ?()
    358     #
    359     try:
    360         from . import mupdf
    361     except Exception:
--> 362         import mupdf
    363     mupdf.reinit_singlethreaded()
    364
    365 def _int_rc(text):

ModuleNotFoundError: No module named 'mupdf'

>>> pip install pymupdf
Looking in indexes: https://mirrors.cernet.edu.cn/pypi/web/simple
Requirement already satisfied: pymupdf in /Users/./.local/lib/python3.10/site-packages (1.24.14)
>>> pip install mupdf
...
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Operating system | 操作系统

MacOS

Python version | Python 版本

3.10

Software version | 软件版本 (magic-pdf --version)

0.10.x

Device mode | 设备模式

cpu

myhloli commented 2 days ago

把pymupdf卸了重装一下看看呢

vincent507cpu commented 2 days ago

把pymupdf卸了重装一下看看呢

试了,不行,后来把环境重新装了,现在报这个错

OSError: dlopen(/Users/./miniconda3/envs/langchain/lib/python3.10/site-packages/torchtext/lib/libtorchtext.so, 0x0006): Symbol not found: __ZN3c105ErrorC1ENSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEES7_PKv
  Referenced from: <8349B302-A1C9-3870-AB5A-21A14A352BC2> /Users/./miniconda3/envs/langchain/lib/python3.10/site-packages/torchtext/lib/libtorchtext.so
  Expected in:     <BA9C42A5-EA1D-3784-80E1-73FBFDE05847> /Users/./miniconda3/envs/langchain/lib/python3.10/site-packages/torch/lib/libc10.dylib
myhloli commented 2 days ago

把pymupdf卸了重装一下看看呢

试了,不行,后来把环境重新装了,现在报这个错


OSError: dlopen(/Users/./miniconda3/envs/langchain/lib/python3.10/site-packages/torchtext/lib/libtorchtext.so, 0x0006): Symbol not found: __ZN3c105ErrorC1ENSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEES7_PKv

  Referenced from: <8349B302-A1C9-3870-AB5A-21A14A352BC2> /Users/./miniconda3/envs/langchain/lib/python3.10/site-packages/torchtext/lib/libtorchtext.so

  Expected in:     <BA9C42A5-EA1D-3784-80E1-73FBFDE05847> /Users/./miniconda3/envs/langchain/lib/python3.10/site-packages/torch/lib/libc10.dylib

pip list 看下torch开头的几个库的版本号

vincent507cpu commented 2 days ago

torch 2.3.1 torchtext 0.18.0 torchvision 0.18.1

myhloli commented 2 days ago

torch 2.3.1 torchtext 0.18.0 torchvision 0.18.1

这个版本号是对的,但是你报错看着又像版本号不匹配,可以试试把这三个卸了再重装,重装的时候指定你现在的这个版本

vincent507cpu commented 1 day ago

torch 2.3.1 torchtext 0.18.0 torchvision 0.18.1

这个版本号是对的,但是你报错看着又像版本号不匹配,可以试试把这三个卸了再重装,重装的时候指定你现在的这个版本

还没有解决。。。

myhloli commented 1 day ago

什么型号的mac呢,我看看下周能不能找个相近配置的机型测下

vincent507cpu commented 1 day ago

什么型号的mac呢,我看看下周能不能找个相近配置的机型测下

M1 Max,macOS 15.1。真的是太麻烦您了!