opendatalab / MinerU

A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。
https://opendatalab.com/OpenSourceTools
GNU Affero General Public License v3.0
13.43k stars 1.01k forks source link

开启OCR加速后报错 #474

Open srliuhao opened 2 months ago

srliuhao commented 2 months ago

Description of the bug | 错误描述

未开启前处理同一篇PDF是正常的,按照说明 下载paddlepaddle-gpu, 安装完成后会自动开启ocr加速 python -m pip install paddlepaddle-gpu==3.0.0b1 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/

再执行同一篇PDF,刚加载一会儿就报错,如下:

C++ Traceback (most recent call last):

0 at::_ops::conv2d::call(at::Tensor const&, at::Tensor const&, std::optional const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, c10::SymInt) 1 at::native::conv2d_symint(at::Tensor const&, at::Tensor const&, std::optional const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, c10::SymInt) 2 at::_ops::convolution::call(at::Tensor const&, at::Tensor const&, std::optional const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, bool, c10::ArrayRef, c10::SymInt) 3 at::_ops::convolution::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, std::optional const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, bool, c10::ArrayRef, c10::SymInt) 4 at::native::convolution(at::Tensor const&, at::Tensor const&, std::optional const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, bool, c10::ArrayRef, long) 5 at::_ops::_convolution::call(at::Tensor const&, at::Tensor const&, std::optional const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, bool, c10::ArrayRef, c10::SymInt, bool, bool, bool, bool) 6 at::native::_convolution(at::Tensor const&, at::Tensor const&, std::optional const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, bool, c10::ArrayRef, long, bool, bool, bool, bool) 7 at::_ops::cudnn_convolution::call(at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, c10::SymInt, bool, bool, bool) 8 at::native::cudnn_convolution(at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, bool, bool, bool)


Error Message Summary:

FatalError: Segmentation fault is detected by the operating system. [TimeInfo: Aborted at 1724315059 (unix time) try "date -d @1724315059" if you are using GNU date ] [SignalInfo: SIGSEGV (@0x20000002ee0) received by PID 755228 (TID 0x7f98b2361200) from PID 12000 ]

How to reproduce the bug | 如何复现

python -m pip install paddlepaddle-gpu==3.0.0b1 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/ magic-pdf -p test1.pdf

Operating system | 操作系统

Linux

Python version | Python 版本

3.10

Software version | 软件版本 (magic-pdf --version)

0.7.x

Device mode | 设备模式

cuda