opendatalab / MinerU

A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。
https://opendatalab.com/OpenSourceTools
GNU Affero General Public License v3.0
11.34k stars 850 forks source link

5900X cpu跑的,cpu的python占用率才15%+7G内存 #609

Open dayfan0810 opened 6 days ago

dayfan0810 commented 6 days ago

Description of the bug | 错误描述

这么设置的: { "bucket_info":{ "bucket-name-1":["ak", "sk", "endpoint"], "bucket-name-2":["ak", "sk", "endpoint"] }, "models-dir":"D:/tools/pdf/PDF-Extract-Kit/models", "device-mode":"cpu", "table-config": { "model": "TableMaster", "is_table_recog_enable": true, "max_time": 400 } }

How to reproduce the bug | 如何复现

magic-pdf -p "E:\power\xxx.pdf" -o E:\power\111 -m auto

Operating system | 操作系统

Windows

Python version | Python 版本

3.12

Software version | 软件版本 (magic-pdf --version)

0.8.x

Device mode | 设备模式

cpu

myhloli commented 6 days ago

amd可能不兼容mkl库,用intel芯片的话cpu使用率可以拉满

myhloli commented 6 days ago

https://gist.github.com/cty-yyds/41bcfa6a71670527c93049aa9a5d249f 可以试试按这个教程,装个老版本mkl库再改下环境变量看看能不能强制在amd的cpu上跑起来

dayfan0810 commented 5 days ago

感谢,知道啥原因就行,我现在用cuda来试试。