opendatalab / MinerU

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
https://opendatalab.com/OpenSourceTools?tool=extract
GNU Affero General Public License v3.0
20.09k stars 1.43k forks source link

是多卡的问题还是别的问题? #324

Closed Leosgp closed 4 months ago

Leosgp commented 4 months ago

Description of the bug | 错误描述

错误:model/PDF-Extract-Kit/models CustomVisionEncoderDecoderModel init CustomMBartForCausalLM init CustomMBartDecoder init 2024-08-05 14:51:57.803 | ERROR | magic_pdf.cli.magicpdf:parse_doc:338 - Expected one of cpu, cuda, ipu, xpu, mkldnn, opengl, opencl, ideep, hip, ve, fpga, ort, xla, lazy, vulkan, mps, meta, hpu, mtia, privateuseone device type at start of device string: gpu Traceback (most recent call last):

File "/home/algo/.conda/envs/MinerU/bin/magic-pdf", line 8, in sys.exit(cli()) │ │ └ │ └ └ <module 'sys' (built-in)> File "/home/algo/.conda/envs/MinerU/lib/python3.10/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) │ │ │ └ {} │ │ └ () │ └ <function BaseCommand.main at 0x7ff131a64430> └ File "/home/algo/.conda/envs/MinerU/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) │ │ └ <click.core.Context object at 0x7ff131c65a80> │ └ <function MultiCommand.invoke at 0x7ff131a653f0> └ File "/home/algo/.conda/envs/MinerU/lib/python3.10/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) │ │ │ │ └ <click.core.Context object at 0x7fefe30e28f0> │ │ │ └ <function Command.invoke at 0x7ff131a64ee0> │ │ └ │ └ <click.core.Context object at 0x7fefe30e28f0> └ <function MultiCommand.invoke.._process_result at 0x7ff13919fd90> File "/home/algo/.conda/envs/MinerU/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) │ │ │ │ │ └ {'pdf': '/home/algo/zhengkaiyuan/PaperEx/data/明日DMP操作手册-V3-2-2.pdf', 'model': None, 'method': 'auto', 'inside_model': True, '... │ │ │ │ └ <click.core.Context object at 0x7fefe30e28f0> │ │ │ └ <function pdf_command at 0x7fefe2af5a20> │ │ └ │ └ <function Context.invoke at 0x7ff131a47be0> └ <click.core.Context object at 0x7fefe30e28f0> File "/home/algo/.conda/envs/MinerU/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(args, **kwargs) │ └ {'pdf': '/home/algo/zhengkaiyuan/PaperEx/data/明日DMP操作手册-V3-2-2.pdf', 'model': None, 'method': 'auto', 'inside_model': True, '... └ () File "/home/algo/.conda/envs/MinerU/lib/python3.10/site-packages/magic_pdf/cli/magicpdf.py", line 352, in pdf_command parse_doc(pdf) │ └ '/home/algo/zhengkaiyuan/PaperEx/data/明日DMP操作手册-V3-2-2.pdf' └ <function pdf_command..parse_doc at 0x7fefe2af5750>

File "/home/algo/.conda/envs/MinerU/lib/python3.10/site-packages/magic_pdf/cli/magicpdf.py", line 330, in parse_doc do_parse( └ <function do_parse at 0x7fefe2af5360> File "/home/algo/.conda/envs/MinerU/lib/python3.10/site-packages/magic_pdf/cli/magicpdf.py", line 111, in do_parse pipe.pipe_analyze() │ └ <function UNIPipe.pipe_analyze at 0x7fefe30c7f40> └ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x7fefe30e2a10> File "/home/algo/.conda/envs/MinerU/lib/python3.10/site-packages/magic_pdf/pipe/UNIPipe.py", line 29, in pipe_analyze self.model_list = doc_analyze(self.pdf_bytes, ocr=False) │ │ │ │ └ b'%PDF-1.5\r\n%\xb5\xb5\xb5\xb5\r\n1 0 obj\r\n<</Type/Catalog/Pages 2 0 R/Lang(zh-CN) /StructTreeRoot 381 0 R/MarkInfo<</Mark... │ │ │ └ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x7fefe30e2a10> │ │ └ <function doc_analyze at 0x7ff1306d71c0> │ └ [] └ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x7fefe30e2a10> File "/home/algo/.conda/envs/MinerU/lib/python3.10/site-packages/magic_pdf/model/doc_analyze_by_custom_model.py", line 103, in doc_analyze custom_model = model_manager.get_model(ocr, show_log) │ │ │ └ False │ │ └ False │ └ <function ModelSingleton.get_model at 0x7ff1306d7130> └ <magic_pdf.model.doc_analyze_by_custom_model.ModelSingleton object at 0x7fefe30e2c50> File "/home/algo/.conda/envs/MinerU/lib/python3.10/site-packages/magic_pdf/model/doc_analyze_by_custom_model.py", line 63, in get_model self._models[key] = custom_model_init(ocr=ocr, show_log=show_log) │ │ │ │ │ └ False │ │ │ │ └ False │ │ │ └ <function custom_model_init at 0x7ff1306d7010> │ │ └ (False, False) │ └ {} └ <magic_pdf.model.doc_analyze_by_custom_model.ModelSingleton object at 0x7fefe30e2c50> File "/home/algo/.conda/envs/MinerU/lib/python3.10/site-packages/magic_pdf/model/doc_analyze_by_custom_model.py", line 87, in custom_model_init custom_model = CustomPEKModel(ocr=ocr, show_log=show_log, models_dir=local_models_dir, device=device) │ │ │ │ └ 'gpu' │ │ │ └ '/home/algo/pretrain_model/PDF-Extract-Kit/models' │ │ └ False │ └ False └ <class 'magic_pdf.model.pdf_extract_kit.CustomPEKModel'> File "/home/algo/.conda/envs/MinerU/lib/python3.10/site-packages/magic_pdf/model/pdf_extract_kit.py", line 119, in init self.mfr_model, mfr_vis_processors = mfr_model_init(mfr_weight_dir, mfr_cfg_path, device=self.device) │ │ │ │ │ └ 'gpu' │ │ │ │ └ <magic_pdf.model.pdf_extract_kit.CustomPEKModel object at 0x7fefe30e3760> │ │ │ └ '/home/algo/.conda/envs/MinerU/lib/python3.10/site-packages/magic_pdf/resources/model_config/UniMERNet/demo.yaml' │ │ └ '/home/algo/pretrain_model/PDF-Extract-Kit/models/MFR/UniMERNet' │ └ <function mfr_model_init at 0x7fee08ee13f0> └ <magic_pdf.model.pdf_extract_kit.CustomPEKModel object at 0x7fefe30e3760> File "/home/algo/.conda/envs/MinerU/lib/python3.10/site-packages/magic_pdf/model/pdf_extract_kit.py", line 48, in mfr_model_init model = model.to(device) │ │ └ 'gpu' │ └ <function Module.to at 0x7fef1b0acca0> └ UniMERModel( (model): DonutEncoderDecoder( (model): CustomVisionEncoderDecoderModel( (encoder): DonutSwinModel( ... File "/home/algo/.conda/envs/MinerU/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1137, in to device, dtype, non_blocking, convert_to_format = torch._C._nn._parse_to(*args, **kwargs) │ │ │ │ │ │ │ └ {} │ │ │ │ │ │ └ ('gpu',) │ │ │ │ │ └ │ │ │ │ └ <module 'torch._C._nn'> │ │ │ └ <module 'torch._C' from '/home/algo/.conda/envs/MinerU/lib/python3.10/site-packages/torch/_C.cpython-310-x86_64-linux-gnu.so'> │ │ └ <module 'torch' from '/home/algo/.conda/envs/MinerU/lib/python3.10/site-packages/torch/init.py'> │ └ <class 'torch.dtype'> └ <class 'torch.device'>

RuntimeError: Expected one of cpu, cuda, ipu, xpu, mkldnn, opengl, opencl, ideep, hip, ve, fpga, ort, xla, lazy, vulkan, mps, meta, hpu, mtia, privateuseone device type at start of device string: gpu

CPU正常识别,但是速度很慢,服务器有8张A100,但是改config文件CPU为GPU报这个错误

How to reproduce the bug | 如何复现

错误:model/PDF-Extract-Kit/models CustomVisionEncoderDecoderModel init CustomMBartForCausalLM init CustomMBartDecoder init 2024-08-05 14:51:57.803 | ERROR | magic_pdf.cli.magicpdf:parse_doc:338 - Expected one of cpu, cuda, ipu, xpu, mkldnn, opengl, opencl, ideep, hip, ve, fpga, ort, xla, lazy, vulkan, mps, meta, hpu, mtia, privateuseone device type at start of device string: gpu Traceback (most recent call last):

File "/home/algo/.conda/envs/MinerU/bin/magic-pdf", line 8, in sys.exit(cli()) │ │ └ │ └ └ <module 'sys' (built-in)> File "/home/algo/.conda/envs/MinerU/lib/python3.10/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) │ │ │ └ {} │ │ └ () │ └ <function BaseCommand.main at 0x7ff131a64430> └ File "/home/algo/.conda/envs/MinerU/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) │ │ └ <click.core.Context object at 0x7ff131c65a80> │ └ <function MultiCommand.invoke at 0x7ff131a653f0> └ File "/home/algo/.conda/envs/MinerU/lib/python3.10/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) │ │ │ │ └ <click.core.Context object at 0x7fefe30e28f0> │ │ │ └ <function Command.invoke at 0x7ff131a64ee0> │ │ └ │ └ <click.core.Context object at 0x7fefe30e28f0> └ <function MultiCommand.invoke.._process_result at 0x7ff13919fd90> File "/home/algo/.conda/envs/MinerU/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) │ │ │ │ │ └ {'pdf': '/home/algo/zhengkaiyuan/PaperEx/data/明日DMP操作手册-V3-2-2.pdf', 'model': None, 'method': 'auto', 'inside_model': True, '... │ │ │ │ └ <click.core.Context object at 0x7fefe30e28f0> │ │ │ └ <function pdf_command at 0x7fefe2af5a20> │ │ └ │ └ <function Context.invoke at 0x7ff131a47be0> └ <click.core.Context object at 0x7fefe30e28f0> File "/home/algo/.conda/envs/MinerU/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(args, **kwargs) │ └ {'pdf': '/home/algo/zhengkaiyuan/PaperEx/data/明日DMP操作手册-V3-2-2.pdf', 'model': None, 'method': 'auto', 'inside_model': True, '... └ () File "/home/algo/.conda/envs/MinerU/lib/python3.10/site-packages/magic_pdf/cli/magicpdf.py", line 352, in pdf_command parse_doc(pdf) │ └ '/home/algo/zhengkaiyuan/PaperEx/data/明日DMP操作手册-V3-2-2.pdf' └ <function pdf_command..parse_doc at 0x7fefe2af5750>

File "/home/algo/.conda/envs/MinerU/lib/python3.10/site-packages/magic_pdf/cli/magicpdf.py", line 330, in parse_doc do_parse( └ <function do_parse at 0x7fefe2af5360> File "/home/algo/.conda/envs/MinerU/lib/python3.10/site-packages/magic_pdf/cli/magicpdf.py", line 111, in do_parse pipe.pipe_analyze() │ └ <function UNIPipe.pipe_analyze at 0x7fefe30c7f40> └ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x7fefe30e2a10> File "/home/algo/.conda/envs/MinerU/lib/python3.10/site-packages/magic_pdf/pipe/UNIPipe.py", line 29, in pipe_analyze self.model_list = doc_analyze(self.pdf_bytes, ocr=False) │ │ │ │ └ b'%PDF-1.5\r\n%\xb5\xb5\xb5\xb5\r\n1 0 obj\r\n<</Type/Catalog/Pages 2 0 R/Lang(zh-CN) /StructTreeRoot 381 0 R/MarkInfo<</Mark... │ │ │ └ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x7fefe30e2a10> │ │ └ <function doc_analyze at 0x7ff1306d71c0> │ └ [] └ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x7fefe30e2a10> File "/home/algo/.conda/envs/MinerU/lib/python3.10/site-packages/magic_pdf/model/doc_analyze_by_custom_model.py", line 103, in doc_analyze custom_model = model_manager.get_model(ocr, show_log) │ │ │ └ False │ │ └ False │ └ <function ModelSingleton.get_model at 0x7ff1306d7130> └ <magic_pdf.model.doc_analyze_by_custom_model.ModelSingleton object at 0x7fefe30e2c50> File "/home/algo/.conda/envs/MinerU/lib/python3.10/site-packages/magic_pdf/model/doc_analyze_by_custom_model.py", line 63, in get_model self._models[key] = custom_model_init(ocr=ocr, show_log=show_log) │ │ │ │ │ └ False │ │ │ │ └ False │ │ │ └ <function custom_model_init at 0x7ff1306d7010> │ │ └ (False, False) │ └ {} └ <magic_pdf.model.doc_analyze_by_custom_model.ModelSingleton object at 0x7fefe30e2c50> File "/home/algo/.conda/envs/MinerU/lib/python3.10/site-packages/magic_pdf/model/doc_analyze_by_custom_model.py", line 87, in custom_model_init custom_model = CustomPEKModel(ocr=ocr, show_log=show_log, models_dir=local_models_dir, device=device) │ │ │ │ └ 'gpu' │ │ │ └ '/home/algo/pretrain_model/PDF-Extract-Kit/models' │ │ └ False │ └ False └ <class 'magic_pdf.model.pdf_extract_kit.CustomPEKModel'> File "/home/algo/.conda/envs/MinerU/lib/python3.10/site-packages/magic_pdf/model/pdf_extract_kit.py", line 119, in init self.mfr_model, mfr_vis_processors = mfr_model_init(mfr_weight_dir, mfr_cfg_path, device=self.device) │ │ │ │ │ └ 'gpu' │ │ │ │ └ <magic_pdf.model.pdf_extract_kit.CustomPEKModel object at 0x7fefe30e3760> │ │ │ └ '/home/algo/.conda/envs/MinerU/lib/python3.10/site-packages/magic_pdf/resources/model_config/UniMERNet/demo.yaml' │ │ └ '/home/algo/pretrain_model/PDF-Extract-Kit/models/MFR/UniMERNet' │ └ <function mfr_model_init at 0x7fee08ee13f0> └ <magic_pdf.model.pdf_extract_kit.CustomPEKModel object at 0x7fefe30e3760> File "/home/algo/.conda/envs/MinerU/lib/python3.10/site-packages/magic_pdf/model/pdf_extract_kit.py", line 48, in mfr_model_init model = model.to(device) │ │ └ 'gpu' │ └ <function Module.to at 0x7fef1b0acca0> └ UniMERModel( (model): DonutEncoderDecoder( (model): CustomVisionEncoderDecoderModel( (encoder): DonutSwinModel( ... File "/home/algo/.conda/envs/MinerU/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1137, in to device, dtype, non_blocking, convert_to_format = torch._C._nn._parse_to(*args, **kwargs) │ │ │ │ │ │ │ └ {} │ │ │ │ │ │ └ ('gpu',) │ │ │ │ │ └ │ │ │ │ └ <module 'torch._C._nn'> │ │ │ └ <module 'torch._C' from '/home/algo/.conda/envs/MinerU/lib/python3.10/site-packages/torch/_C.cpython-310-x86_64-linux-gnu.so'> │ │ └ <module 'torch' from '/home/algo/.conda/envs/MinerU/lib/python3.10/site-packages/torch/init.py'> │ └ <class 'torch.dtype'> └ <class 'torch.device'>

RuntimeError: Expected one of cpu, cuda, ipu, xpu, mkldnn, opengl, opencl, ideep, hip, ve, fpga, ort, xla, lazy, vulkan, mps, meta, hpu, mtia, privateuseone device type at start of device string: gpu

Operating system | 操作系统

Linux

Python version | Python 版本

3.10

Software version | 软件版本 (magic-pdf --version)

0.6.x

Device mode | 设备模式

cuda

myhloli commented 4 months ago

请仔细阅读readme中关于cuda加速的相关说明,device-mode不能是“gpu”,如果需要cuda加速,device-mode的值应设置为“cuda”。

https://github.com/opendatalab/MinerU/blob/master/docs/README_Ubuntu_CUDA_Acceleration_zh_CN.md

Leosgp commented 4 months ago

感谢,已经成功解决~