模型下载好了，config也配置好了，运行解析pdf，报错

fuxuelinwudi commented 2 weeks ago

Description of the bug | 错误描述

[10/14 10:33:42 d2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from /data/deploy/pre_release/fxl/gomate/all_server/comp_server/mineru_root/root_models/PDF-Extract-Kit/models/Layout/model_final.pth ... [10/14 10:33:42 fvcore.common.checkpoint]: [Checkpointer] Loading from /data/deploy/pre_release/fxl/gomate/all_server/comp_server/mineru_root/root_models/PDF-Extract-Kit/models/Layout/model_final.pth ... 2024-10-14 10:33:45.280 | INFO | magic_pdf.model.pdf_extract_kit:init:248 - DocAnalysis init done! 2024-10-14 10:33:45.280 | INFO | magic_pdf.model.doc_analyze_by_custom_model:custom_model_init:98 - model init cost: 25.69013738632202 2024-10-14 10:33:50.120 | INFO | magic_pdf.model.pdf_extract_kit:call:259 - layout detection cost: 3.53

0: 1888x1312 (no detections), 78.5ms Speed: 16.2ms preprocess, 78.5ms inference, 0.6ms postprocess per image at shape (1, 3, 1888, 1312) 2024-10-14 10:33:50.704 | INFO | magic_pdf.model.pdf_extract_kit:call:289 - formula nums: 0, mfr time: 0.0 2024-10-14 10:33:50.905 | ERROR | magic_pdf.tools.cli:parse_doc:96 - (External) CUBLAS error(7). [Hint: 'CUBLAS_STATUS_INVALID_VALUE'. An unsupported value or parameter was passed to the function (a negative vector size, for example). To correct: ensure that all the parameters being passed have valid values. ] (at ../paddle/phi/kernels/funcs/blas/blas_impl.cu.h:40) [operator < fc > error]

How to reproduce the bug | 如何复现

我的config信息：

{ "bucket_info": { "bucket-name-1": [ "ak", "sk", "endpoint" ], "bucket-name-2": [ "ak", "sk", "endpoint" ] }, "models-dir": "/data/deploy/pre_release/fxl/gomate/all_server/comp_server/mineru_root/root_models/PDF-Extract-Kit/models", "layoutreader-model-dir": "/data/deploy/pre_release/fxl/gomate/all_server/comp_server/mineru_root/root_models/layoutreader", "device-mode": "cuda", "table-config": { "model": "TableMaster", "is_table_recog_enable": false, "max_time": 400 } }

Operating system | 操作系统

Linux

Python version | Python 版本

3.10

Software version | 软件版本 (magic-pdf --version)

0.8.x

Device mode | 设备模式

cuda

fuxuelinwudi commented 2 weeks ago

我的cuda version是121，运行了 python -m pip install paddlepaddle-gpu==3.0.0b1 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/ 是不是这个问题？但是我运行： python -m pip install paddlepaddle-gpu==3.0.0b1 -i https://www.paddlepaddle.org.cn/packages/stable/cu121/ paddle装不上

myhloli commented 2 weeks ago

看报错是paddle和cuda不兼容，linux的paddle是自带cuda环境的，是需要装118版本避免和torch的12.1冲突，都不会用到你系统的cuda环境

fuxuelinwudi commented 2 weeks ago

我该怎么做？

看报错是paddle和cuda不兼容，linux的paddle是自带cuda环境的，是需要装118版本避免和torch的12.1冲突，都不会用到你系统的cuda环境

myhloli commented 2 weeks ago

可以卸了paddlepaddle-gpu和paddlepaddle，再重装paddlepaddle，使用cpu版本的paddle运行。

fuxuelinwudi commented 2 weeks ago

可以卸了paddlepaddle-gpu和paddlepaddle，再重装paddlepaddle，使用cpu版本的paddle运行。

可以了，谢谢，如果我要用cuda版本的，是需要linux系统的cuda为11.8吗

myhloli commented 2 weeks ago

不需要改linux内的cuda，linux的torch和paddle的cuda都是通过pip依赖的形式安装在conda的虚拟环境中的，linux只需要安装driver即可。如果按教程安装下来，结果不兼容，一般都比较难调，建议降级到cpu使用或者更换部署环境。

fuxuelinwudi commented 2 weeks ago

不需要改linux内的cuda，linux的torch和paddle的cuda都是通过pip依赖的形式安装在conda的虚拟环境中的，linux只需要安装driver即可。如果按教程安装下来，结果不兼容，一般都比较难调，建议降级到cpu使用或者更换部署环境。

好的

fuxuelinwudi commented 2 weeks ago

请问这个shell： magic-pdf -p small_ocr.pdf

执行的是哪个py文件？我想自己写一个py

fuxuelinwudi commented 2 weeks ago

我安装了118cuda，然后安装了paddle-gpu，又报了个错：

2024-10-14 11:36:23.335 | ERROR | magic_pdf.tools.cli:parse_doc:96 - Unable to avoid copy while creating an array as requested. If using np.array(obj, copy=False) replace it with np.asarray(obj) to allow a copy when needed (no behavior change in NumPy 1.x). For more details, see https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword

myhloli commented 2 weeks ago

请问这个shell： magic-pdf -p small_ocr.pdf

执行的是哪个py文件？我想自己写一个py

https://github.com/opendatalab/MinerU/blob/master/magic_pdf/tools/cli.py

myhloli commented 2 weeks ago

我安装了118cuda，然后安装了paddle-gpu，又报了个错：

2024-10-14 11:36:23.335 | ERROR | magic_pdf.tools.cli:parse_doc:96 - Unable to avoid copy while creating an array as requested. If using np.array(obj, copy=False) replace it with np.asarray(obj) to allow a copy when needed (no behavior change in NumPy 1.x). For more details, see https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword

不兼容2.0以上的numpy，需要降级到1.x

fuxuelinwudi commented 2 weeks ago

我看识别的md结果，好像做不到多级标题的识别？全部被识别为一级标题了

myhloli commented 2 weeks ago

我看识别的md结果，好像做不到多级标题的识别？全部被识别为一级标题了

目前没有多级标题识别能力

opendatalab / MinerU