opendatalab / MinerU

A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。
https://opendatalab.com/OpenSourceTools
GNU Affero General Public License v3.0
10.79k stars 795 forks source link

detectron2 依赖已经通过自编译安装,但是运行时仍然报缺少依赖 #174

Closed yezhoujie closed 1 month ago

yezhoujie commented 1 month ago

Description of the bug | 错误描述

detectron2 依赖已经通过自编译安装,但是运行时仍然报缺少依赖 image image

How to reproduce the bug | 如何复现

pip install magic-pdf git clone https://github.com/facebookresearch/detectron2.git python -m pip install -e detectron2 pip install magic-pdf detectron2 ./bin/magic-pdf pdf-command --pdf "/tmp/test.pdf" --inside_model true

Operating system | 操作系统

MacOS

Python version | Python 版本

3.12

Software version | 软件版本 (magic-pdf --version)

0.6.x

Device mode | 设备模式

mps

myhloli commented 1 month ago

依赖没有安装完全,请使用

pip install magic-pdf[full-cpu]==0.6.1

安装额外依赖

yezhoujie commented 1 month ago

依赖没有安装完全,请使用

pip install magic-pdf[full-cpu]==0.6.1

安装额外依赖

image

yezhoujie commented 1 month ago

依赖没有安装完全,请使用

pip install magic-pdf[full-cpu]==0.6.1

安装额外依赖

[full-cpu] 这个是要替换成什么参数?

myhloli commented 1 month ago

https://github.com/opendatalab/MinerU/blob/master/docs/FAQ_zh_cn.md#2在较新版本的mac上使用命令安装pip-install-magic-pdffull-cpu-zsh-no-matches-found-magic-pdffull-cpu

yezhoujie commented 1 month ago

https://github.com/opendatalab/MinerU/blob/master/docs/FAQ_zh_cn.md#2在较新版本的mac上使用命令安装pip-install-magic-pdffull-cpu-zsh-no-matches-found-magic-pdffull-cpu

image Mac MPS 是需要安装 [cpu] 还是 [gpu] ?

myhloli commented 1 month ago

逻辑不应该进入这里,

./bin/magic-pdf pdf-command --pdf "/tmp/test.pdf" --inside_model true --model_mode full

再看看

yezhoujie commented 1 month ago

pip install magic-pdf[full-cpu]==0.6.1 安装0.6.1 版本时,报错: image

myhloli commented 1 month ago

https://pypi.org/project/eva-decord/#files 这个依赖包在arm macOS上最高支持到py 3.11版本,可能需要您去下载源码手动编译一下来解决依赖问题。比较推荐的做法是使用conda或者venv创建一个3.10或者3.11的虚拟环境

yezhoujie commented 1 month ago

使用 py3.11 虚拟环境安装完成后,依赖问题解决了 但运行 ./bin/magic-pdf pdf-command --pdf "/tmp/test.pdf" --inside_model true 使用内置模型时,报错:

Traceback (most recent call last):
  File "/Users/yzj/python-venv/magic-pdf/./bin/magic-pdf", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/magic_pdf/cli/magicpdf.py", line 325, in pdf_command
    do_parse(
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/magic_pdf/cli/magicpdf.py", line 111, in do_parse
    pipe.pipe_analyze()
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/magic_pdf/pipe/UNIPipe.py", line 29, in pipe_analyze
    self.model_list = doc_analyze(self.pdf_bytes, ocr=False)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/magic_pdf/model/doc_analyze_by_custom_model.py", line 69, in doc_analyze
    custom_model = CustomPEKModel(ocr=ocr, show_log=show_log, models_dir=local_models_dir, device=device)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/magic_pdf/model/pdf_extract_kit.py", line 106, in __init__
    self.mfd_model = mfd_model_init(str(os.path.join(models_dir, self.configs["weights"]["mfd"])))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/magic_pdf/model/pdf_extract_kit.py", line 29, in mfd_model_init
    mfd_model = YOLO(weight)
                ^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/ultralytics/models/yolo/model.py", line 23, in __init__
    super().__init__(model=model, task=task, verbose=verbose)
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/ultralytics/engine/model.py", line 149, in __init__
    self._load(model, task=task)
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/ultralytics/engine/model.py", line 230, in _load
    self.model, self.ckpt = attempt_load_one_weight(weights)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/ultralytics/nn/tasks.py", line 855, in attempt_load_one_weight
    ckpt, weight = torch_safe_load(weight)  # load ckpt
                   ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/ultralytics/nn/tasks.py", line 781, in torch_safe_load
    ckpt = torch.load(file, map_location="cpu")
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/torch/serialization.py", line 997, in load
    with _open_file_like(f, 'rb') as opened_file:
         ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/torch/serialization.py", line 444, in _open_file_like
    return _open_file(name_or_buffer, mode)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/torch/serialization.py", line 425, in __init__
    super().__init__(open(name, mode))
                     ^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/models/MFD/weights.pt'

~/magic-pdf.json 配置文件如下:

{
    "bucket_info":{
        "bucket-name-1":["ak", "sk", "endpoint"],
        "bucket-name-2":["ak", "sk", "endpoint"]
    },
    "temp-output-dir":"/tmp",
    "models-dir":"/tmp/models",
    "device-mode":"mps"
}

是配置文件或者哪里还有什么配置不对么?

myhloli commented 1 month ago

"models-dir":"/tmp/models" 要配置成你存储模型文件的绝对路径

yezhoujie commented 1 month ago

"models-dir":"/tmp/models" 要配置成你存储模型文件的绝对路径

从huggingface 上下载模型文件 git lfs clone https://huggingface.co/wanderkid/PDF-Extract-Kit

image

并配置了绝对路径

{
    "bucket_info":{
        "bucket-name-1":["ak", "sk", "endpoint"],
        "bucket-name-2":["ak", "sk", "endpoint"]
    },
    "models-dir": "/Users/yzj/python-venv/magic-pdf/PDF-Extract-Kit/models"
    "temp-output-dir":"/tmp",
    "device-mode":"mps"
}

运行时报错:

Traceback (most recent call last):
  File "/Users/yzj/python-venv/magic-pdf/bin/magic-pdf", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/magic_pdf/cli/magicpdf.py", line 325, in pdf_command
    do_parse(
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/magic_pdf/cli/magicpdf.py", line 90, in do_parse
    local_image_dir, local_md_dir = prepare_env(pdf_file_name, parse_method)
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/magic_pdf/cli/magicpdf.py", line 56, in prepare_env
    local_parent_dir = os.path.join(get_local_dir(), "magic-pdf", pdf_file_name, method)
                                    ^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/magic_pdf/libs/config_reader.py", line 58, in get_local_dir
    config = read_config()
             ^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/magic_pdf/libs/config_reader.py", line 23, in read_config
    config = json.load(f)
             ^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.9_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/__init__.py", line 293, in load
    return loads(fp.read(),
           ^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.9_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.9_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.9_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
               ^^^^^^^^^^^^^^^^^^^^^^
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 7 column 5 (char 208)

好像是json 解析出错

myhloli commented 1 month ago

"models-dir": "/Users/yzj/python-venv/magic-pdf/PDF-Extract-Kit/models"后面少了一个","

yezhoujie commented 1 month ago

使用MPS加速报错:

[07/22 08:59:06 d2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from /Users/yzj/python-venv/magic-pdf/PDF-Extract-Kit/models/Layout/model_final.pth ...
[07/22 08:59:06 fvcore.common.checkpoint]: [Checkpointer] Loading from /Users/yzj/python-venv/magic-pdf/PDF-Extract-Kit/models/Layout/model_final.pth ...
2024-07-22 08:59:06.445 | INFO     | magic_pdf.model.pdf_extract_kit:__init__:124 - DocAnalysis init done!
2024-07-22 08:59:06.445 | INFO     | magic_pdf.model.doc_analyze_by_custom_model:doc_analyze:74 - model init cost: 18.112290859222412
Traceback (most recent call last):
  File "/Users/yzj/python-venv/magic-pdf/bin/magic-pdf", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/magic_pdf/cli/magicpdf.py", line 325, in pdf_command
    do_parse(
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/magic_pdf/cli/magicpdf.py", line 111, in do_parse
    pipe.pipe_analyze()
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/magic_pdf/pipe/UNIPipe.py", line 29, in pipe_analyze
    self.model_list = doc_analyze(self.pdf_bytes, ocr=False)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/magic_pdf/model/doc_analyze_by_custom_model.py", line 87, in doc_analyze
    result = custom_model(img)
             ^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/magic_pdf/model/pdf_extract_kit.py", line 133, in __call__
    layout_res = self.layout_model(image, ignore_catids=[])
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/magic_pdf/model/pek_sub_modules/layoutlmv3/model_init.py", line 133, in __call__
    outputs = self.predictor(image)
              ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/detectron2/detectron2/engine/defaults.py", line 319, in __call__
    predictions = self.model([inputs])[0]
                  ^^^^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/magic_pdf/model/pek_sub_modules/layoutlmv3/rcnn_vl.py", line 55, in forward
    return self.inference(batched_inputs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/magic_pdf/model/pek_sub_modules/layoutlmv3/rcnn_vl.py", line 113, in inference
    features = self.backbone(input)
               ^^^^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/detectron2/detectron2/modeling/backbone/fpn.py", line 139, in forward
    bottom_up_features = self.bottom_up(x)
                         ^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/magic_pdf/model/pek_sub_modules/layoutlmv3/backbone.py", line 106, in forward
    return self.backbone.forward(
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/magic_pdf/model/pek_sub_modules/layoutlmv3/layoutlmft/models/layoutlmv3/modeling_layoutlmv3.py", line 906, in forward
    visual_emb = self.forward_image(images)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/magic_pdf/model/pek_sub_modules/layoutlmv3/layoutlmft/models/layoutlmv3/modeling_layoutlmv3.py", line 785, in forward_image
    x = self.patch_embed(x, self.pos_embed[:, 1:, :] if self.pos_embed is not None else None)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/magic_pdf/model/pek_sub_modules/layoutlmv3/layoutlmft/models/layoutlmv3/modeling_layoutlmv3.py", line 71, in forward
    position_embedding = F.interpolate(position_embedding, size=(Hp, Wp), mode='bicubic')
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yzj/python-venv/magic-pdf/lib/python3.11/site-packages/torch/nn/functional.py", line 4073, in interpolate
    return torch._C._nn.upsample_bicubic2d(input, output_size, align_corners, scale_factors)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
NotImplementedError: The operator 'aten::upsample_bicubic2d.out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
myhloli commented 1 month ago

As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU as a fallback for this op.

yezhoujie commented 1 month ago
2024-07-22 09:36:19.771 | INFO     | magic_pdf.model.pdf_extract_kit:__init__:92 - DocAnalysis init, this may take some times. apply_layout: True, apply_formula: True, apply_ocr: False

magic-pdf pdf-command --pdf "test.pdf" --inside_model true 命令默认不启用OCR, 如何开启OCR?

myhloli commented 1 month ago

--method ocr