opendatalab / MinerU

A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。
https://opendatalab.com/OpenSourceTools
GNU Affero General Public License v3.0
11.19k stars 835 forks source link

开启Flask应用后,执行magic-pdf,解析到一半会失败。 #459

Closed josh1707 closed 3 weeks ago

josh1707 commented 3 weeks ago

Description of the bug | 错误描述

服务器上,同时开启Flask应用时候。 magic-pdf 无论是命令行形式,还是api形式都会执行失败。

How to reproduce the bug | 如何复现

  1. 开启Flask服务。 python app.py

  2. 执行magic-pdf。 magic-pdf -p /data/aitmp/10001676_2.pdf -o /data/aitmp/pdf/ -m auto

    每次都是 DocAnalysis init done 后失败。 输出:

    [08/19 20:59:05 d2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from /data/wanderkid/PDF-Extract-Kit/models/Layout/model_final.pth ...
    [08/19 20:59:05 fvcore.common.checkpoint]: [Checkpointer] Loading from /data/wanderkid/PDF-Extract-Kit/models/Layout/model_final.pth ...
    2024-08-19 20:59:07.559 | INFO     | magic_pdf.model.pdf_extract_kit:__init__:148 - DocAnalysis init done!
    2024-08-19 20:59:07.560 | INFO     | magic_pdf.model.doc_analyze_by_custom_model:custom_model_init:98 - model init cost: 44.169516801834106
    Killed

Operating system | 操作系统

Linux

Python version | Python 版本

3.10

Software version | 软件版本 (magic-pdf --version)

0.7.x

Device mode | 设备模式

cpu

myhloli commented 3 weeks ago

top看一下是不是内存不足了