Closed RGthx closed 3 months ago
pip list看看有没有安装magic-pdf的包?
使用pip list列出的列表如下图,里面是有magic-pdf的包的(不然demo.py也跑不起来) ` C:\Users\rgthx>conda activate MinerU
(MinerU) C:\Users\rgthx>pip list Package Version
absl-py 2.1.0 aiohappyeyeballs 2.3.4 aiohttp 3.10.1 aiosignal 1.3.1 albucore 0.0.13 albumentations 1.4.12 annotated-types 0.7.0 antlr4-python3-runtime 4.9.3 anyio 4.4.0 astor 0.8.1 async-timeout 4.0.3 attrdict 2.0.1 attrs 24.1.0 Babel 2.15.0 bce-python-sdk 0.9.19 beautifulsoup4 4.12.3 black 24.8.0 blinker 1.8.2 boto3 1.34.153 botocore 1.34.153 braceexpand 0.1.7 Brotli 1.1.0 cachetools 5.4.0 certifi 2024.7.4 cffi 1.16.0 charset-normalizer 3.3.2 click 8.1.7 cloudpickle 3.0.0 colorama 0.4.6 colorlog 6.8.2 contourpy 1.2.1 cryptography 43.0.0 cssselect 1.2.0 cssutils 2.11.1 cycler 0.12.1 Cython 3.0.11 datasets 2.20.0 decorator 5.1.1 detectron2 0.6 dill 0.3.8 et-xmlfile 1.1.0 eva-decord 0.6.1 eval_type_backport 0.2.0 evaluate 0.4.2 exceptiongroup 1.2.2 fairscale 0.4.13 fast-langdetect 0.2.0 fasttext-wheel 0.9.2 filelock 3.15.4 fire 0.6.0 Flask 3.0.3 flask-babel 4.0.0 fonttools 4.53.1 frozenlist 1.4.1 fsspec 2024.5.0 ftfy 6.2.0 future 1.0.0 fvcore 0.1.5.post20221221 grpcio 1.65.4 h11 0.14.0 httpcore 1.0.5 httpx 0.27.0 huggingface-hub 0.24.5 hydra-core 1.3.2 idna 3.7 imageio 2.34.2 imgaug 0.4.0 intel-openmp 2021.4.0 iopath 0.1.9 itsdangerous 2.2.0 Jinja2 3.1.4 jmespath 1.0.1 joblib 1.4.2 kiwisolver 1.4.5 langdetect 1.0.9 lazy_loader 0.4 lmdb 1.5.1 loguru 0.7.2 lxml 5.2.2 magic-pdf 0.6.2b1 Markdown 3.6 MarkupSafe 2.1.5 matplotlib 3.9.0 mkl 2021.4.0 more-itertools 10.3.0 mpmath 1.3.0 multidict 6.0.5 multiprocess 0.70.16 mypy-extensions 1.0.0 networkx 3.3 numpy 1.26.4 omegaconf 2.3.0 opencv-contrib-python 4.6.0.66 opencv-python 4.6.0.66 opencv-python-headless 4.10.0.84 openpyxl 3.1.5 opt-einsum 3.3.0 packaging 24.1 paddleocr 2.7.3 paddlepaddle 2.6.1 pandas 2.2.2 pathspec 0.12.1 pdf2docx 0.5.8 pdfminer.six 20231228 pillow 10.4.0 pip 24.0 platformdirs 4.2.2 portalocker 2.10.1 premailer 3.10.0 protobuf 3.20.2 psutil 6.0.0 py-cpuinfo 9.0.0 pyarrow 17.0.0 pyarrow-hotfix 0.6 pybind11 2.13.1 pyclipper 1.3.0.post5 pycocotools 2.0.8 pycparser 2.22 pycryptodome 3.20.0 pydantic 2.8.2 pydantic_core 2.20.1 PyMuPDF 1.24.9 PyMuPDFb 1.24.9 pyparsing 3.1.2 python-dateutil 2.9.0.post0 python-docx 1.1.2 pytz 2024.1 pywin32 306 PyYAML 6.0.1 rapidfuzz 3.9.5 rarfile 4.2 regex 2024.7.24 requests 2.32.3 robust-downloader 0.0.2 s3transfer 0.10.2 safetensors 0.4.4 scikit-image 0.24.0 scikit-learn 1.5.1 scipy 1.14.0 seaborn 0.13.2 setuptools 72.1.0 shapely 2.0.5 six 1.16.0 sniffio 1.3.1 soupsieve 2.5 sympy 1.13.1 tabulate 0.9.0 tbb 2021.13.0 tensorboard 2.17.0 tensorboard-data-server 0.7.2 termcolor 2.4.0 threadpoolctl 3.5.0 tifffile 2024.7.24 timm 0.9.16 tokenizers 0.19.1 tomli 2.0.1 torch 2.3.1 torchtext 0.18.0 torchvision 0.18.1 tqdm 4.66.5 transformers 4.40.0 typing_extensions 4.12.2 tzdata 2024.1 ultralytics 8.2.73 ultralytics-thop 2.0.0 unimernet 0.1.6 urllib3 2.2.2 visualdl 2.5.3 Wand 0.6.13 wcwidth 0.2.13 webdataset 0.2.86 Werkzeug 3.0.3 wheel 0.43.0 win32-setctime 1.1.0 wordninja 2.0.0 xxhash 3.4.1 yacs 0.1.8 yarl 1.9.4
(MinerU) C:\Users\rgthx>magic-pdf -v 'magic-pdf' 不是内部或外部命令,也不是可运行的程序 或批处理文件。
(MinerU) C:\Users\rgthx> `
抱歉,解决了 解决方法是在管理员权限下的Anaconda prompt内激活对应虚拟环境并配置 我之前是直接终端内激活的环境;pip默认下载到了c盘里 参考:https://blog.csdn.net/m0_65634471/article/details/130297467
Description of the bug | 错误描述
虽然我按教程部署了虚拟环境以及下载了对应的模型文件等并配置好 但是命令行操作模式并不可用;也无法使用magic-pdf --version等命令查看;然而我运行示例demo.py文件是可以正常运行并输出预期md文件的 是我需要对环境变量等什么修改吗?
How to reproduce the bug | 如何复现
PyTorch built with:
[08/06 00:07:45 detectron2]: Command line arguments: {'config_file': 'C:\Users\rgthx\AppData\Roaming\Python\Python310\site-packages\magic_pdf\resources\model_config\layoutlmv3\layoutlmv3_base_inference.yaml', 'resume': False, 'eval_only': False, 'num_gpus': 1, 'num_machines': 1, 'machine_rank': 0, 'dist_url': 'tcp://127.0.0.1:57823', 'opts': ['MODEL.WEIGHTS', 'D:/Anaconda/envs/MinerU/models\Layout/model_final.pth']} [08/06 00:07:45 detectron2]: Contents of args.config_file=C:\Users\rgthx\AppData\Roaming\Python\Python310\site-packages\magic_pdf\resources\model_config\layoutlmv3\layoutlmv3_base_inference.yaml: AUG: DETR: true CACHE_DIR: ~/cache/huggingface CUDNN_BENCHMARK: false DATALOADER: ASPECT_RATIO_GROUPING: true FILTER_EMPTY_ANNOTATIONS: false NUM_WORKERS: 4 REPEAT_THRESHOLD: 0.0 SAMPLER_TRAIN: TrainingSampler DATASETS: PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000 PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000 PROPOSAL_FILES_TEST: [] PROPOSAL_FILES_TRAIN: [] TEST:
[08/06 00:07:46 d2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from D:/Anaconda/envs/MinerU/models\Layout/model_final.pth ... [08/06 00:07:46 fvcore.common.checkpoint]: [Checkpointer] Loading from d:/Anaconda/envs/MinerU/models\Layout/model_final.pth ... 2024-08-06 00:07:47.665 | INFO | magic_pdf.model.pdf_extract_kit:init:132 - DocAnalysis init done! 2024-08-06 00:07:47.666 | INFO | magic_pdf.model.doc_analyze_by_custom_model:custom_model_init:92 - model init cost: 21.418904542922974 2024-08-06 00:08:02.520 | INFO | magic_pdf.model.pdf_extract_kit:call:143 - layout detection cost: 14.49
0: 1888x1408 7 embeddings, 5043.3ms Speed: 27.0ms preprocess, 5043.3ms inference, 1.0ms postprocess per image at shape (1, 3, 1888, 1408) 2024-08-06 00:08:14.034 | INFO | magic_pdf.model.pdf_extract_kit:call:173 - formula nums: 7, mfr time: 4.14 2024-08-06 00:08:35.911 | INFO | magic_pdf.model.pdf_extract_kit:call:143 - layout detection cost: 21.88
0: 1888x1408 3 embeddings, 6740.8ms Speed: 28.5ms preprocess, 6740.8ms inference, 1.0ms postprocess per image at shape (1, 3, 1888, 1408) 2024-08-06 00:08:46.955 | INFO | magic_pdf.model.pdf_extract_kit:call:173 - formula nums: 3, mfr time: 4.25 2024-08-06 00:09:16.613 | INFO | magic_pdf.model.pdf_extract_kit:call:143 - layout detection cost: 29.66
0: 1888x1408 18 embeddings, 2 isolateds, 6887.6ms Speed: 26.2ms preprocess, 6887.6ms inference, 1.0ms postprocess per image at shape (1, 3, 1888, 1408) 2024-08-06 00:09:34.476 | INFO | magic_pdf.model.pdf_extract_kit:call:173 - formula nums: 20, mfr time: 10.83 2024-08-06 00:09:49.655 | INFO | magic_pdf.model.pdf_extract_kit:call:143 - layout detection cost: 15.18
0: 1888x1408 32 embeddings, 4 isolateds, 3605.5ms Speed: 24.7ms preprocess, 3605.5ms inference, 1.0ms postprocess per image at shape (1, 3, 1888, 1408) 2024-08-06 00:10:25.533 | INFO | magic_pdf.model.pdf_extract_kit:call:173 - formula nums: 36, mfr time: 32.08 2024-08-06 00:10:53.225 | INFO | magic_pdf.model.pdf_extract_kit:call:143 - layout detection cost: 27.69
0: 1888x1408 7 embeddings, 1 isolated, 5715.7ms Speed: 30.6ms preprocess, 5715.7ms inference, 1.0ms postprocess per image at shape (1, 3, 1888, 1408) 2024-08-06 00:11:11.822 | INFO | magic_pdf.model.pdf_extract_kit:call:173 - formula nums: 8, mfr time: 12.79 2024-08-06 00:11:39.676 | INFO | magic_pdf.model.pdf_extract_kit:call:143 - layout detection cost: 27.85
0: 1888x1408 6 embeddings, 5515.7ms Speed: 26.1ms preprocess, 5515.7ms inference, 2.0ms postprocess per image at shape (1, 3, 1888, 1408) 2024-08-06 00:11:50.873 | INFO | magic_pdf.model.pdf_extract_kit:call:173 - formula nums: 6, mfr time: 5.62 2024-08-06 00:12:18.559 | INFO | magic_pdf.model.pdf_extract_kit:call:143 - layout detection cost: 27.69
0: 1888x1408 20 embeddings, 5642.8ms Speed: 27.1ms preprocess, 5642.8ms inference, 2.0ms postprocess per image at shape (1, 3, 1888, 1408) 2024-08-06 00:12:41.888 | INFO | magic_pdf.model.pdf_extract_kit:call:173 - formula nums: 20, mfr time: 17.55 2024-08-06 00:12:57.379 | INFO | magic_pdf.model.pdf_extract_kit:call:143 - layout detection cost: 15.49
0: 1888x1408 7 embeddings, 4560.2ms Speed: 15.2ms preprocess, 4560.2ms inference, 2.2ms postprocess per image at shape (1, 3, 1888, 1408) 2024-08-06 00:13:05.562 | INFO | magic_pdf.model.pdf_extract_kit:call:173 - formula nums: 7, mfr time: 3.57 2024-08-06 00:13:22.253 | INFO | magic_pdf.model.pdf_extract_kit:call:143 - layout detection cost: 16.69
0: 1888x1408 15 embeddings, 4624.4ms Speed: 26.3ms preprocess, 4624.4ms inference, 1.0ms postprocess per image at shape (1, 3, 1888, 1408) 2024-08-06 00:13:34.929 | INFO | magic_pdf.model.pdf_extract_kit:call:173 - formula nums: 15, mfr time: 7.94 2024-08-06 00:13:51.540 | INFO | magic_pdf.model.pdf_extract_kit:call:143 - layout detection cost: 16.61
0: 1888x1408 1 embedding, 6940.7ms Speed: 25.4ms preprocess, 6940.7ms inference, 2.0ms postprocess per image at shape (1, 3, 1888, 1408) 2024-08-06 00:14:00.052 | INFO | magic_pdf.model.pdf_extract_kit:call:173 - formula nums: 1, mfr time: 1.54 2024-08-06 00:14:27.619 | INFO | magic_pdf.model.pdf_extract_kit:call:143 - layout detection cost: 27.57
0: 1888x1408 4 embeddings, 5760.3ms Speed: 29.9ms preprocess, 5760.3ms inference, 1.4ms postprocess per image at shape (1, 3, 1888, 1408) 2024-08-06 00:14:38.158 | INFO | magic_pdf.model.pdf_extract_kit:call:173 - formula nums: 4, mfr time: 4.72 2024-08-06 00:15:04.580 | INFO | magic_pdf.model.pdf_extract_kit:call:143 - layout detection cost: 26.42
0: 1888x1408 1 embedding, 3774.9ms Speed: 25.6ms preprocess, 3774.9ms inference, 0.0ms postprocess per image at shape (1, 3, 1888, 1408) 2024-08-06 00:15:09.153 | INFO | magic_pdf.model.pdf_extract_kit:call:173 - formula nums: 1, mfr time: 0.77 2024-08-06 00:15:24.270 | INFO | magic_pdf.model.pdf_extract_kit:call:143 - layout detection cost: 15.12
0: 1888x1408 (no detections), 5563.2ms Speed: 26.3ms preprocess, 5563.2ms inference, 1.3ms postprocess per image at shape (1, 3, 1888, 1408) 2024-08-06 00:15:29.863 | INFO | magic_pdf.model.pdf_extract_kit:call:173 - formula nums: 0, mfr time: 0.0 2024-08-06 00:15:29.865 | INFO | magic_pdf.model.doc_analyze_by_custom_model:doc_analyze:118 - doc analyze cost: 461.83641028404236 2024-08-06 00:15:33.156 | INFO | magic_pdf.pipe.UNIPipe:pipe_mk_markdown:48 - uni_pipe mk mm_markdown finished
(MinerU) C:\Users\rgthx\Downloads\MinerU-master\MinerU-master\demo>magic-pdf --help 'magic-pdf' 不是内部或外部命令,也不是可运行的程序 或批处理文件。
Operating system | 操作系统
Windows
Python version | Python 版本
3.10
Software version | 软件版本 (magic-pdf --version)
0.6.x
Device mode | 设备模式
cpu