Closed Knightlj closed 1 month ago
please upload the result of
pip list
我也是同样的问题,centos系统,pip list 为pip list Package Version
absl-py 2.1.0 aiohttp 3.9.5 aiosignal 1.3.1 albucore 0.0.12 albumentations 1.4.12 altair 5.3.0 annotated-types 0.7.0 antlr4-python3-runtime 4.9.3 anyio 4.4.0 argon2-cffi 23.1.0 argon2-cffi-bindings 21.2.0 arrow 1.3.0 astor 0.8.1 asttokens 2.4.1 async-lru 2.0.4 async-timeout 4.0.3 attrdict 2.0.1 attrs 23.2.0 Babel 2.15.0 bce-python-sdk 0.9.17 beautifulsoup4 4.12.3 black 24.4.2 bleach 6.1.0 blinker 1.8.2 boto3 1.34.149 botocore 1.34.149 braceexpand 0.1.7 Brotli 1.1.0 cachetools 5.4.0 certifi 2024.7.4 cffi 1.16.0 charset-normalizer 3.3.2 click 8.1.7 cloudpickle 3.0.0 colorlog 6.8.2 comm 0.2.2 contourpy 1.2.1 cryptography 43.0.0 cssselect 1.2.0 cssutils 2.11.1 cycler 0.12.1 Cython 3.0.10 datasets 2.20.0 debugpy 1.8.2 decorator 5.1.1 defusedxml 0.7.1 detectron2 0.6 dill 0.3.8 et-xmlfile 1.1.0 eva-decord 0.6.1 eval_type_backport 0.2.0 evaluate 0.4.2 exceptiongroup 1.2.2 executing 2.0.1 fairscale 0.4.13 fast-langdetect 0.2.1 fastjsonschema 2.20.0 fasttext-wheel 0.9.2 filelock 3.15.4 fire 0.6.0 Flask 3.0.3 flask-babel 4.0.0 fonttools 4.53.1 fqdn 1.5.1 frozenlist 1.4.1 fsspec 2024.5.0 ftfy 6.2.0 future 1.0.0 fvcore 0.1.5.post20221221 gitdb 4.0.11 GitPython 3.1.43 grpcio 1.65.1 h11 0.14.0 httpcore 1.0.5 httpx 0.27.0 huggingface-hub 0.24.2 hydra-core 1.3.2 idna 3.7 imageio 2.34.2 imgaug 0.4.0 iopath 0.1.9 ipykernel 6.29.5 ipython 8.26.0 isoduration 20.11.0 itsdangerous 2.2.0 jedi 0.19.1 Jinja2 3.1.4 jmespath 1.0.1 joblib 1.4.2 json5 0.9.25 jsonpointer 3.0.0 jsonschema 4.23.0 jsonschema-specifications 2023.12.1 jupyter_client 8.6.2 jupyter_core 5.7.2 jupyter-events 0.10.0 jupyter-lsp 2.2.5 jupyter_server 2.14.2 jupyter_server_terminals 0.5.3 jupyterlab 4.2.4 jupyterlab_pygments 0.3.0 jupyterlab_server 2.27.3 kiwisolver 1.4.5 lazy_loader 0.4 lmdb 1.5.1 loguru 0.7.2 lxml 5.2.2 magic-pdf 0.6.1 Markdown 3.6 markdown-it-py 3.0.0 MarkupSafe 2.1.5 matplotlib 3.9.1 matplotlib-inline 0.1.7 mdurl 0.1.2 mistune 3.0.2 more-itertools 10.3.0 mpmath 1.3.0 multidict 6.0.5 multiprocess 0.70.16 mypy-extensions 1.0.0 nbclient 0.10.0 nbconvert 7.16.4 nbformat 5.10.4 nest-asyncio 1.6.0 networkx 3.3 nltk 3.8.1 notebook_shim 0.2.4 numpy 1.26.4 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 8.9.2.26 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-nccl-cu12 2.20.5 nvidia-nvjitlink-cu12 12.5.82 nvidia-nvtx-cu12 12.1.105 omegaconf 2.3.0 opencv-contrib-python 4.6.0.66 opencv-python 4.6.0.66 opencv-python-headless 4.10.0.84 openpyxl 3.1.5 opt-einsum 3.3.0 overrides 7.7.0 packaging 24.1 paddleocr 2.7.3 paddlepaddle 2.6.1 pandas 2.2.2 pandocfilters 1.5.1 parso 0.8.4 pathspec 0.12.1 pdf2docx 0.5.8 pdf2image 1.17.0 pdfminer.six 20240706 pexpect 4.9.0 pillow 10.4.0 pip 24.0 platformdirs 4.2.2 portalocker 2.10.1 premailer 3.10.0 prometheus_client 0.20.0 prompt_toolkit 3.0.47 protobuf 4.25.4 psutil 6.0.0 ptyprocess 0.7.0 pure_eval 0.2.3 py-cpuinfo 9.0.0 pyarrow 17.0.0 pyarrow-hotfix 0.6 pybind11 2.13.1 pyclipper 1.3.0.post5 pycocotools 2.0.8 pycparser 2.22 pycryptodome 3.20.0 pydantic 2.8.2 pydantic_core 2.20.1 pydeck 0.9.1 Pygments 2.18.0 PyMuPDF 1.24.9 PyMuPDFb 1.24.9 pyparsing 3.1.2 pypdfium2 4.30.0 python-dateutil 2.9.0.post0 python-docx 1.1.2 python-json-logger 2.0.7 pytz 2024.1 PyYAML 6.0.1 pyzmq 26.0.3 rapidfuzz 3.9.4 rarfile 4.2 referencing 0.35.1 regex 2024.7.24 requests 2.32.3 rfc3339-validator 0.1.4 rfc3986-validator 0.1.1 rich 13.7.1 robust-downloader 0.0.2 rpds-py 0.19.1 s3transfer 0.10.2 safetensors 0.4.3 scikit-image 0.24.0 scikit-learn 1.5.1 scipy 1.14.0 seaborn 0.13.2 Send2Trash 1.8.3 setuptools 71.0.4 shapely 2.0.5 six 1.16.0 smmap 5.0.1 sniffio 1.3.1 soupsieve 2.5 stack-data 0.6.3 streamlit 1.37.0 streamlit-drawable-canvas 0.9.3 sympy 1.13.1 tabulate 0.9.0 tenacity 8.5.0 tensorboard 2.17.0 tensorboard-data-server 0.7.2 termcolor 2.4.0 terminado 0.18.1 threadpoolctl 3.5.0 tifffile 2024.7.24 timm 0.9.16 tinycss2 1.3.0 tokenizers 0.19.1 toml 0.10.2 tomli 2.0.1 toolz 0.12.1 torch 2.3.1 torchtext 0.18.0 torchvision 0.18.1 tornado 6.4.1 tqdm 4.66.4 traitlets 5.14.3 transformers 4.40.0 triton 2.3.1 types-python-dateutil 2.9.0.20240316 typing_extensions 4.12.2 tzdata 2024.1 ultralytics 8.2.68 ultralytics-thop 2.0.0 unimernet 0.1.1 uri-template 1.3.0 urllib3 2.2.2 visualdl 2.5.3 Wand 0.6.13 watchdog 4.0.1 wcwidth 0.2.13 webcolors 24.6.0 webdataset 0.2.86 webencodings 0.5.1 websocket-client 1.8.0 Werkzeug 3.0.3 wheel 0.43.0 wordninja 2.0.0 xxhash 3.4.1 yacs 0.1.8 yarl 1.9.4
@zifengdexiatian
Some people might be missing the libgl and libegl libraries on their Linux systems. On Ubuntu, the command is
sudo apt-get update
sudo apt-get install libgl1-mesa-glx libegl1-mesa-dev
You may use the following commands to install these libraries on CentOS.
sudo yum update
sudo yum install mesa-libGL mesa-libEGL-devel
If they do not work, please continue to provide feedback.
@myhloli Thanks, I will try and by the way ask if there is a docker image available
@zifengdexiatian https://github.com/opendatalab/MinerU/pull/189 For docker file please refer to this link, but we have not tested yet.
@drunkpig Thanks a lot!
same issue on windows
@Knightlj problem solved,
File "C:\ProgramData\Anaconda3\lib\site-packages\ultralytics\utils__init__.py", line 21, in
I had issue with matplotlib, you can debug the pdf_extract_kit.py file and see the wrong import
please upload the result of
pip list
pip3 list 如下图所示:
@myhloli Hi, I have provided the figure of "pip list" result as above reply.
@Knightlj you can delete the logger.error and see the exact wrong import module in pdf_extract_kit.py, I updated the matplotlib and it worked
@Knightlj may be your Mac has intel cpu,you should install magic-pdf by
@Knightlj may be your Mac has intel cpu,you should install magic-pdf by
@myhloli 电脑显示是Apple M1 Pro的芯片
@Knightlj
look like you install a base package. please try install full package.
pip install magic-pdf[full-cpu]
@myhloli 不支持
@Knightlj look like you install a base package. please try install full package.
pip install magic-pdf[full-cpu]
@myhloli 按照你刚才的链接中成功执行了“pip3 install magic-pdf[full-cpu]”
现在报另一个错误:2024-07-29 16:20:56.744 | ERROR | magic_pdf.model.pp_structure_v2:
@myhloli 按照你刚才的链接中成功执行了“pip3 install magic-pdf[full-cpu]”
现在报另一个错误:2024-07-29 16:20:56.744 | ERROR | magic_pdf.model.pp_structure_v2::8 - paddleocr not installed, please install by "pip install magic-pdf[cpu]" or "pip install magic-pdf[gpu]"
This is not the expected result. please try:
magic-pdf --version
if your version is 0.5.x,please feedback.
@myhloli 按照你刚才的链接中成功执行了“pip3 install magic-pdf[full-cpu]” 现在报另一个错误:2024-07-29 16:20:56.744 | ERROR | magic_pdf.model.pp_structure_v2::8 - paddleocr not installed, please install by "pip install magic-pdf[cpu]" or "pip install magic-pdf[gpu]"
This is not the expected result. please try:
magic-pdf --version
if your version is 0.5.x,please feedback.
magic-pdf, version 0.5.13
@Knightlj maybe your python env is x86_64, you could switch a arm64 python to install magic-pdf.
@Knightlj maybe your python env is x86_64, you could switch a arm64 python to install magic-pdf.
@myhloli 下图是我电脑及python环境的一些信息,帮忙确认下是否有问题🤦♂️
@Knightlj maybe your python env is x86_64, you could switch a arm64 python to install magic-pdf.
@myhloli 下图是我电脑及python环境的一些信息,帮忙确认下是否有问题🤦♂️
yep, the python platform is x86_64 you should download and install conda with arm https://www.anaconda.com/download/success
@myhloli 我已经重新安装好arm64的python3, 并且重新执行了pip3 install magic-pdf[cpu] 现在报了一种新的错误: ”ImportError: dlopen(/Users/testjam/my_env/lib/python3.12/site-packages/fasttext_pybind.cpython-312-darwin.so, 0x0002): tried: '/Users/testjam/my_env/lib/python3.12/site-packages/fasttext_pybind.cpython-312-darwin.so' (mach-o file, but is an incompatible architecture (have (x86_64), need (arm64e)))“
@myhloli 我已经重新安装好arm64的python3, 并且重新执行了pip3 install magic-pdf[cpu] 现在报了一种新的错误: ”ImportError: dlopen(/Users/testjam/my_env/lib/python3.12/site-packages/fasttext_pybind.cpython-312-darwin.so, 0x0002): tried: '/Users/testjam/my_env/lib/python3.12/site-packages/fasttext_pybind.cpython-312-darwin.so' (mach-o file, but is an incompatible architecture (have (x86_64), need (arm64e)))“
please use conda create a new env with python3.10 and install magic-pdf by
pip install magic-pdf[full-cpu]
pip install detectron2 --extra-index-url https://myhloli.github.io/wheels/
@myhloli @cyz2453057960 放弃在mac上倒腾了,听了cyz的建议,我在ubuntu上倒腾成功啦,感谢两位🙏。另外我发现生成出来的md文件,段落有时不分明,对公式并没有生成latex格式,如下图所示:
建议说明中将命令由[改为\[即可,修改globbing特性有点杀鸡用牛刀了。
@myhloli @cyz2453057960 放弃在mac上倒腾了,听了cyz的建议,我在ubuntu上倒腾成功啦,感谢两位🙏。另外我发现生成出来的md文件,段落有时不分明,对公式并没有生成latex格式,如下图所示:
magic-pdf --version
if result not 0.6.1,maybe wrong again😂
@myhloli @cyz2453057960 放弃在mac上倒腾了,听了cyz的建议,我在ubuntu上倒腾成功啦,感谢两位🙏。另外我发现生成出来的md文件,段落有时不分明,对公式并没有生成latex格式,如下图所示:
magic-pdf --version
if result not 0.6.1,maybe wrong again😂
@myhloli 不是0.6.1😭,还是version 0.5.13。不知道怎么办了
@myhloli
@myhloli
emmm,arm64+linux,many package not support this platform. if you only have arm64 platform,macOS is your first chose system.
我也是一样的问题, M2 的 Mac, python 3.10.14, magic-pdf 0.6.1。 运行“pip install magic-pdf[full-cpu] pip install detectron2 --extra-index-url https://myhloli.github.io/wheels/” 显示所有的包都“Requirement already satisfied”,但是运行行仍然报错"Required dependency not installed, please install by "pip install magic-pdf[full-cpu] detectron2 --extra-index-url https://myhloli.github.io/wheels/" "
我也是一样的问题, M2 的 Mac, python 3.10.14, magic-pdf 0.6.1。 运行“pip install magic-pdf[full-cpu] pip install detectron2 --extra-index-url https://myhloli.github.io/wheels/” 显示所有的包都“Requirement already satisfied”,但是运行行仍然报错"Required dependency not installed, please install by "pip install magic-pdf[full-cpu] detectron2 --extra-index-url https://myhloli.github.io/wheels/" "
pip list Package Version
absl-py 2.1.0 aiohttp 3.9.5 aiosignal 1.3.1 altair 5.3.0 antlr4-python3-runtime 4.9.3 anyio 4.4.0 appnope 0.1.4 argon2-cffi 23.1.0 argon2-cffi-bindings 21.2.0 arrow 1.3.0 astor 0.8.1 asttokens 2.4.1 async-lru 2.0.4 async-timeout 4.0.3 attrdict 2.0.1 attrs 23.2.0 Babel 2.15.0 bce-python-sdk 0.9.17 beautifulsoup4 4.12.3 black 24.4.2 bleach 6.1.0 blinker 1.8.2 boto3 1.34.149 botocore 1.34.149 Brotli 1.1.0 cachetools 5.4.0 certifi 2024.7.4 cffi 1.16.0 charset-normalizer 3.3.2 click 8.1.7 cloudpickle 3.0.0 colorlog 6.8.2 comm 0.2.2 contourpy 1.2.1 cryptography 43.0.0 cssselect 1.2.0 cssutils 2.11.1 cycler 0.12.1 Cython 3.0.10 datasets 2.20.0 debugpy 1.8.2 decorator 5.1.1 defusedxml 0.7.1 detectron2 0.6 dill 0.3.8 et-xmlfile 1.1.0 evaluate 0.4.2 exceptiongroup 1.2.2 executing 2.0.1 fast-langdetect 0.2.1 fastjsonschema 2.20.0 fasttext-wheel 0.9.2 filelock 3.15.4 fire 0.6.0 Flask 3.0.3 flask-babel 4.0.0 fonttools 4.53.1 fqdn 1.5.1 frozenlist 1.4.1 fsspec 2024.5.0 future 1.0.0 fvcore 0.1.5.post20221221 gitdb 4.0.11 GitPython 3.1.43 grpcio 1.65.1 h11 0.14.0 httpcore 1.0.5 httpx 0.27.0 huggingface-hub 0.24.3 hydra-core 1.3.2 idna 3.7 imageio 2.34.2 imgaug 0.4.0 iopath 0.1.9 ipykernel 6.29.5 ipython 8.26.0 isoduration 20.11.0 itsdangerous 2.2.0 jedi 0.19.1 Jinja2 3.1.4 jmespath 1.0.1 joblib 1.4.2 json5 0.9.25 jsonpointer 3.0.0 jsonschema 4.23.0 jsonschema-specifications 2023.12.1 jupyter_client 8.6.2 jupyter_core 5.7.2 jupyter-events 0.10.0 jupyter-lsp 2.2.5 jupyter_server 2.14.2 jupyter_server_terminals 0.5.3 jupyterlab 4.2.4 jupyterlab_pygments 0.3.0 jupyterlab_server 2.27.3 kiwisolver 1.4.5 lazy_loader 0.4 lmdb 1.5.1 loguru 0.7.2 lxml 5.2.2 magic-pdf 0.6.1 Markdown 3.6 markdown-it-py 3.0.0 MarkupSafe 2.1.5 matplotlib 3.9.1 matplotlib-inline 0.1.7 mdurl 0.1.2 mistune 3.0.2 more-itertools 10.3.0 mpmath 1.3.0 multidict 6.0.5 multiprocess 0.70.16 mypy-extensions 1.0.0 nbclient 0.10.0 nbconvert 7.16.4 nbformat 5.10.4 nest-asyncio 1.6.0 networkx 3.3 nltk 3.8.1 notebook_shim 0.2.4 numpy 1.26.4 omegaconf 2.3.0 opencv-contrib-python 4.6.0.66 opencv-python 4.6.0.66 opencv-python-headless 4.10.0.84 openpyxl 3.1.5 opt-einsum 3.3.0 overrides 7.7.0 packaging 24.1 paddleocr 2.7.3 paddlepaddle 2.6.1 pandas 2.2.2 pandocfilters 1.5.1 parso 0.8.4 pathspec 0.12.1 pdf2docx 0.5.8 pdf2image 1.17.0 pdfminer.six 20240706 pexpect 4.9.0 pillow 10.4.0 pip 24.0 platformdirs 4.2.2 portalocker 2.10.1 premailer 3.10.0 prometheus_client 0.20.0 prompt_toolkit 3.0.47 protobuf 4.25.4 psutil 6.0.0 ptyprocess 0.7.0 pure_eval 0.2.3 py-cpuinfo 9.0.0 pyarrow 17.0.0 pyarrow-hotfix 0.6 pybind11 2.13.1 pyclipper 1.3.0.post5 pycocotools 2.0.8 pycparser 2.22 pycryptodome 3.20.0 pydeck 0.9.1 Pygments 2.18.0 PyMuPDF 1.24.9 PyMuPDFb 1.24.9 pyparsing 3.1.2 pypdfium2 4.30.0 python-dateutil 2.9.0.post0 python-docx 1.1.2 python-json-logger 2.0.7 pytz 2024.1 PyYAML 6.0.1 pyzmq 26.0.3 rapidfuzz 3.9.4 rarfile 4.2 referencing 0.35.1 regex 2024.7.24 requests 2.32.3 rfc3339-validator 0.1.4 rfc3986-validator 0.1.1 rich 13.7.1 robust-downloader 0.0.2 rpds-py 0.19.1 s3transfer 0.10.2 scikit-image 0.24.0 scikit-learn 1.5.1 scipy 1.14.0 seaborn 0.13.2 Send2Trash 1.8.3 setuptools 69.5.1 shapely 2.0.5 six 1.16.0 smmap 5.0.1 sniffio 1.3.1 soupsieve 2.5 stack-data 0.6.3 streamlit 1.37.0 streamlit-drawable-canvas 0.9.3 sympy 1.13.1 tabulate 0.9.0 tenacity 8.5.0 tensorboard 2.17.0 tensorboard-data-server 0.7.2 termcolor 2.4.0 terminado 0.18.1 threadpoolctl 3.5.0 tifffile 2024.7.24 tinycss2 1.3.0 toml 0.10.2 tomli 2.0.1 toolz 0.12.1 torch 2.4.0 torchvision 0.19.0 tornado 6.4.1 tqdm 4.66.4 traitlets 5.14.3 types-python-dateutil 2.9.0.20240316 typing_extensions 4.12.2 tzdata 2024.1 ultralytics 8.2.68 ultralytics-thop 2.0.0 unimernet 0.1.2 uri-template 1.3.0 urllib3 2.2.2 visualdl 2.5.3 wcwidth 0.2.13 webcolors 24.6.0 webencodings 0.5.1 websocket-client 1.8.0 Werkzeug 3.0.3 wheel 0.43.0 wordninja 2.0.0 xxhash 3.4.1 yacs 0.1.8 yarl 1.9.4
2024-07-29 22:32:39.654 | WARNING | magic_pdf.cli.magicpdf:get_model_json:310 - not found json /Users/testjam/Desktop/test.json existed 2024-07-29 22:32:39.655 | INFO | magic_pdf.cli.magicpdf:do_parse:91 - local output dir is /tmp/magic-pdf/test/auto 2024-07-29 22:32:42.735 | INFO | magic_pdf.libs.pdf_check:detect_invalid_chars:57 - cid_count: 79, text_len: 56274, cid_chars_radio: 0.0014167862266857962 zsh: illegal hardware instruction magic-pdf pdf-command --pdf /Users/testjam/Desktop/test.pdf --inside_model
@myhloli 额,换了台mac,倒腾下又变成了新的错误😭
@Knightlj @qinzhenlove @zifengdexiatian We have updated to the 0.6.2b1 release, addressing and resolving the aforementioned issue.
Description of the bug | 错误描述
2024-07-29 09:40:30.261 | ERROR | magic_pdf.model.pdf_extract_kit::24 - Required dependency not installed, please install by
"pip install magic-pdf[full-cpu] detectron2 --extra-index-url https://myhloli.github.io/wheels/"
detectron2 安装成功之后仍然一直报这个错误
How to reproduce the bug | 如何复现
Operating system | 操作系统
MacOS
Python version | Python 版本
3.10
Software version | 软件版本 (magic-pdf --version)
0.6.x
Device mode | 设备模式
mps