Closed tuhang closed 1 month ago
pip list
(MinerU_GPU) C:\Users\tu_ha>pip list
Package Version
------------------------- ------------------
absl-py 2.1.0
aiohttp 3.9.5
aiosignal 1.3.1
albucore 0.0.12
albumentations 1.4.12
altair 5.3.0
annotated-types 0.7.0
antlr4-python3-runtime 4.9.3
anyio 4.4.0
argon2-cffi 23.1.0
argon2-cffi-bindings 21.2.0
arrow 1.3.0
astor 0.8.1
asttokens 2.4.1
async-lru 2.0.4
async-timeout 4.0.3
attrdict 2.0.1
attrs 23.2.0
Babel 2.15.0
bce-python-sdk 0.9.17
beautifulsoup4 4.12.3
black 24.4.2
bleach 6.1.0
blinker 1.8.2
boto3 1.34.149
botocore 1.34.149
braceexpand 0.1.7
Brotli 1.1.0
cachetools 5.4.0
certifi 2024.7.4
cffi 1.16.0
charset-normalizer 3.3.2
click 8.1.7
cloudpickle 3.0.0
colorama 0.4.6
colorlog 6.8.2
comm 0.2.2
contourpy 1.2.1
cryptography 43.0.0
cssselect 1.2.0
cssutils 2.11.1
cycler 0.12.1
Cython 3.0.10
datasets 2.20.0
debugpy 1.8.2
decorator 5.1.1
defusedxml 0.7.1
detectron2 0.6
dill 0.3.8
et-xmlfile 1.1.0
eva-decord 0.6.1
eval_type_backport 0.2.0
evaluate 0.4.2
exceptiongroup 1.2.2
executing 2.0.1
fairscale 0.4.13
fast-langdetect 0.2.1
fastjsonschema 2.20.0
fasttext-wheel 0.9.2
filelock 3.15.4
fire 0.6.0
Flask 3.0.3
flask-babel 4.0.0
fonttools 4.53.1
fqdn 1.5.1
frozenlist 1.4.1
fsspec 2024.5.0
ftfy 6.2.0
future 1.0.0
fvcore 0.1.5.post20221221
gitdb 4.0.11
GitPython 3.1.43
grpcio 1.64.1
h11 0.14.0
httpcore 1.0.5
httpx 0.27.0
huggingface-hub 0.24.2
hydra-core 1.3.2
idna 3.7
imageio 2.34.2
imgaug 0.4.0
intel-openmp 2021.4.0
iopath 0.1.9
ipykernel 6.29.5
ipython 8.26.0
isoduration 20.11.0
itsdangerous 2.2.0
jedi 0.19.1
Jinja2 3.1.4
jmespath 1.0.1
joblib 1.4.2
json5 0.9.25
jsonpointer 3.0.0
jsonschema 4.23.0
jsonschema-specifications 2023.12.1
jupyter_client 8.6.2
jupyter_core 5.7.2
jupyter-events 0.10.0
jupyter-lsp 2.2.5
jupyter_server 2.14.2
jupyter_server_terminals 0.5.3
jupyterlab 4.2.4
jupyterlab_pygments 0.3.0
jupyterlab_server 2.27.3
kiwisolver 1.4.5
lazy_loader 0.4
lmdb 1.5.1
loguru 0.7.2
lxml 5.2.2
magic-pdf 0.6.1
Markdown 3.6
markdown-it-py 3.0.0
MarkupSafe 2.1.5
matplotlib 3.9.1
matplotlib-inline 0.1.7
mdurl 0.1.2
mistune 3.0.2
mkl 2021.4.0
more-itertools 10.3.0
mpmath 1.3.0
multidict 6.0.5
multiprocess 0.70.16
mypy-extensions 1.0.0
nbclient 0.10.0
nbconvert 7.16.4
nbformat 5.10.4
nest-asyncio 1.6.0
networkx 3.3
nltk 3.8.1
notebook_shim 0.2.4
numpy 1.26.4
omegaconf 2.3.0
opencv-contrib-python 4.6.0.66
opencv-python 4.6.0.66
opencv-python-headless 4.10.0.84
openpyxl 3.1.5
opt-einsum 3.3.0
overrides 7.7.0
packaging 24.1
paddleocr 2.7.3
paddlepaddle 2.6.1
pandas 2.2.2
pandocfilters 1.5.1
parso 0.8.4
pathspec 0.12.1
pdf2docx 0.5.8
pdf2image 1.17.0
pdfminer.six 20240706
pillow 10.4.0
pip 24.0
platformdirs 4.2.2
portalocker 2.10.1
premailer 3.10.0
prometheus_client 0.20.0
prompt_toolkit 3.0.47
protobuf 3.20.2
psutil 6.0.0
pure_eval 0.2.3
py-cpuinfo 9.0.0
pyarrow 17.0.0
pyarrow-hotfix 0.6
pybind11 2.13.1
pyclipper 1.3.0.post5
pycocotools 2.0.8
pycparser 2.22
pycryptodome 3.20.0
pydantic 2.8.2
pydantic_core 2.20.1
pydeck 0.9.1
Pygments 2.18.0
PyMuPDF 1.24.9
PyMuPDFb 1.24.9
pyparsing 3.1.2
pypdfium2 4.30.0
python-dateutil 2.9.0.post0
python-docx 1.1.2
python-json-logger 2.0.7
pytz 2024.1
pywin32 306
pywinpty 2.0.13
PyYAML 6.0.1
pyzmq 26.0.3
rapidfuzz 3.9.4
rarfile 4.2
referencing 0.35.1
regex 2024.7.24
requests 2.32.3
rfc3339-validator 0.1.4
rfc3986-validator 0.1.1
rich 13.7.1
robust-downloader 0.0.2
rpds-py 0.19.1
s3transfer 0.10.2
safetensors 0.4.3
scikit-image 0.24.0
scikit-learn 1.5.1
scipy 1.14.0
seaborn 0.13.2
Send2Trash 1.8.3
setuptools 69.5.1
shapely 2.0.5
six 1.16.0
smmap 5.0.1
sniffio 1.3.1
soupsieve 2.5
stack-data 0.6.3
streamlit 1.37.0
streamlit-drawable-canvas 0.9.3
sympy 1.13.1
tabulate 0.9.0
tbb 2021.13.0
tenacity 8.5.0
tensorboard 2.17.0
tensorboard-data-server 0.7.2
termcolor 2.4.0
terminado 0.18.1
threadpoolctl 3.5.0
tifffile 2024.7.24
timm 0.9.16
tinycss2 1.3.0
tokenizers 0.19.1
toml 0.10.2
tomli 2.0.1
toolz 0.12.1
torch 2.4.0+cu124
torchtext 0.18.0
torchvision 0.19.0+cu124
tornado 6.4.1
tqdm 4.66.4
traitlets 5.14.3
transformers 4.40.0
types-python-dateutil 2.9.0.20240316
typing_extensions 4.12.2
tzdata 2024.1
ultralytics 8.2.68
ultralytics-thop 2.0.0
unimernet 0.1.1
uri-template 1.3.0
urllib3 2.2.2
visualdl 2.5.3
Wand 0.6.13
watchdog 4.0.1
wcwidth 0.2.13
webcolors 24.6.0
webdataset 0.2.86
webencodings 0.5.1
websocket-client 1.8.0
Werkzeug 3.0.3
wheel 0.43.0
win32-setctime 1.1.0
wordninja 2.0.0
xxhash 3.4.1
yacs 0.1.8
yarl 1.9.4
An attempt was made to reference PyTorch using Python, but there was an error in importing the package. I suspect it's a problem with the package again.
(MinerU_GPU) C:\Users\tu_ha>python
Python 3.10.14 | packaged by Anaconda, Inc. | (main, May 6 2024, 19:44:50) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\tu_ha\.conda\envs\MinerU_GPU\lib\site-packages\torch\__init__.py", line 148, in <module>
raise err
OSError: [WinError 126] 找不到指定的模块。 Error loading "C:\Users\tu_ha\.conda\envs\MinerU_GPU\lib\site-packages\torch\lib\fbgemm.dll" or one of its dependencies.
>>>
I saw the control of the version of torch in other issues ,The same error was reported despite the attempt.
(MinerU_GPU) C:\Users\tu_ha>pip install torch==2.3.1 torchvision==0.18.1
Looking in indexes: https://mirrors.aliyun.com/pypi/simple/
Collecting torch==2.3.1
Using cached https://mirrors.aliyun.com/pypi/packages/85/fc/ee5bb50eff313149657f173b003649677e27fa3aaae1ecc806add37f017c/torch-2.3.1-cp310-cp310-win_amd64.whl (159.8 MB)
Collecting torchvision==0.18.1
Using cached https://mirrors.aliyun.com/pypi/packages/4e/62/3816637079b77875077678bd7087285a5b5589664f94f5ceb2d080cc024c/torchvision-0.18.1-cp310-cp310-win_amd64.whl (1.2 MB)
Requirement already satisfied: filelock in c:\users\tu_ha\.conda\envs\mineru_gpu\lib\site-packages (from torch==2.3.1) (3.15.4)
Requirement already satisfied: typing-extensions>=4.8.0 in c:\users\tu_ha\.conda\envs\mineru_gpu\lib\site-packages (from torch==2.3.1) (4.12.2)
Requirement already satisfied: sympy in c:\users\tu_ha\.conda\envs\mineru_gpu\lib\site-packages (from torch==2.3.1) (1.13.1)
Requirement already satisfied: networkx in c:\users\tu_ha\.conda\envs\mineru_gpu\lib\site-packages (from torch==2.3.1) (3.3)
Requirement already satisfied: jinja2 in c:\users\tu_ha\.conda\envs\mineru_gpu\lib\site-packages (from torch==2.3.1) (3.1.4)
Requirement already satisfied: fsspec in c:\users\tu_ha\.conda\envs\mineru_gpu\lib\site-packages (from torch==2.3.1) (2024.5.0)
Requirement already satisfied: mkl<=2021.4.0,>=2021.1.1 in c:\users\tu_ha\.conda\envs\mineru_gpu\lib\site-packages (from torch==2.3.1) (2021.4.0)
Requirement already satisfied: numpy in c:\users\tu_ha\.conda\envs\mineru_gpu\lib\site-packages (from torchvision==0.18.1) (1.26.4)
Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in c:\users\tu_ha\.conda\envs\mineru_gpu\lib\site-packages (from torchvision==0.18.1) (10.4.0)
Requirement already satisfied: intel-openmp==2021.* in c:\users\tu_ha\.conda\envs\mineru_gpu\lib\site-packages (from mkl<=2021.4.0,>=2021.1.1->torch==2.3.1) (2021.4.0)
Requirement already satisfied: tbb==2021.* in c:\users\tu_ha\.conda\envs\mineru_gpu\lib\site-packages (from mkl<=2021.4.0,>=2021.1.1->torch==2.3.1) (2021.13.0)
Requirement already satisfied: MarkupSafe>=2.0 in c:\users\tu_ha\.conda\envs\mineru_gpu\lib\site-packages (from jinja2->torch==2.3.1) (2.1.5)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in c:\users\tu_ha\.conda\envs\mineru_gpu\lib\site-packages (from sympy->torch==2.3.1) (1.3.0)
Installing collected packages: torch, torchvision
Successfully installed torch-2.3.1 torchvision-0.18.1
pip list
torch 2.3.1
torchtext 0.18.0
torchvision 0.18.1
The same error still exists.
2024-07-30 01:34:44.452 | INFO | magic_pdf.libs.pdf_check:detect_invalid_chars:57 - cid_count: 0, text_len: 291, cid_chars_radio: 0.0
2024-07-30 01:34:44.452 | WARNING | magic_pdf.filter.pdf_classify_by_type:classify:334 - pdf is not classified by area and text_len, by_image_area: False, by_text: True, by_avg_words: False, by_img_num: True, by_text_layout: True, by_img_narrow_strips: True, by_invalid_chars: True
[2024-07-30 01:34:52,748] [ ERROR] check_version.py:39 - Error fetching version info
Traceback (most recent call last):
File "C:\Users\tu_ha\.conda\envs\MinerU_CPU\lib\urllib\request.py", line 1348, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File "C:\Users\tu_ha\.conda\envs\MinerU_CPU\lib\http\client.py", line 1283, in request
self._send_request(method, url, body, headers, encode_chunked)
File "C:\Users\tu_ha\.conda\envs\MinerU_CPU\lib\http\client.py", line 1329, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "C:\Users\tu_ha\.conda\envs\MinerU_CPU\lib\http\client.py", line 1278, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "C:\Users\tu_ha\.conda\envs\MinerU_CPU\lib\http\client.py", line 1038, in _send_output
self.send(msg)
File "C:\Users\tu_ha\.conda\envs\MinerU_CPU\lib\http\client.py", line 976, in send
self.connect()
File "C:\Users\tu_ha\.conda\envs\MinerU_CPU\lib\http\client.py", line 1455, in connect
self.sock = self._context.wrap_socket(self.sock,
File "C:\Users\tu_ha\.conda\envs\MinerU_CPU\lib\ssl.py", line 513, in wrap_socket
return self.sslsocket_class._create(
File "C:\Users\tu_ha\.conda\envs\MinerU_CPU\lib\ssl.py", line 1104, in _create
self.do_handshake()
File "C:\Users\tu_ha\.conda\envs\MinerU_CPU\lib\ssl.py", line 1375, in do_handshake
self._sslobj.do_handshake()
TimeoutError: _ssl.c:990: The handshake operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\tu_ha\.conda\envs\MinerU_CPU\lib\site-packages\albumentations\check_version.py", line 29, in fetch_version_info
with opener.open(url, timeout=2) as response:
File "C:\Users\tu_ha\.conda\envs\MinerU_CPU\lib\urllib\request.py", line 519, in open
response = self._open(req, data)
File "C:\Users\tu_ha\.conda\envs\MinerU_CPU\lib\urllib\request.py", line 536, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "C:\Users\tu_ha\.conda\envs\MinerU_CPU\lib\urllib\request.py", line 496, in _call_chain
result = func(*args)
File "C:\Users\tu_ha\.conda\envs\MinerU_CPU\lib\urllib\request.py", line 1391, in https_open
return self.do_open(http.client.HTTPSConnection, req,
File "C:\Users\tu_ha\.conda\envs\MinerU_CPU\lib\urllib\request.py", line 1351, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error _ssl.c:990: The handshake operation timed out>
2024-07-30 01:34:54.724 | INFO | magic_pdf.model.pdf_extract_kit:__init__:92 - DocAnalysis init, this may take some times. apply_layout: True, apply_formula: True, apply_ocr: True
2024-07-30 01:34:54.724 | INFO | magic_pdf.model.pdf_extract_kit:__init__:100 - using device: cuda
CustomVisionEncoderDecoderModel init
CustomMBartForCausalLM init
CustomMBartDecoder init
Traceback (most recent call last):
File "C:\Users\tu_ha\.conda\envs\MinerU_CPU\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\tu_ha\.conda\envs\MinerU_CPU\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Users\tu_ha\.conda\envs\MinerU_GPU\Scripts\magic-pdf.exe\__main__.py", line 7, in <module>
sys.exit(cli())
File "C:\Users\tu_ha\.conda\envs\MinerU_CPU\lib\site-packages\click\core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "C:\Users\tu_ha\.conda\envs\MinerU_CPU\lib\site-packages\click\core.py", line 1078, in main
rv = self.invoke(ctx)
File "C:\Users\tu_ha\.conda\envs\MinerU_CPU\lib\site-packages\click\core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "C:\Users\tu_ha\.conda\envs\MinerU_CPU\lib\site-packages\click\core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "C:\Users\tu_ha\.conda\envs\MinerU_CPU\lib\site-packages\click\core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "C:\Users\tu_ha\.conda\envs\MinerU_CPU\lib\site-packages\magic_pdf\cli\magicpdf.py", line 325, in pdf_command
do_parse(
File "C:\Users\tu_ha\.conda\envs\MinerU_CPU\lib\site-packages\magic_pdf\cli\magicpdf.py", line 111, in do_parse
pipe.pipe_analyze()
File "C:\Users\tu_ha\.conda\envs\MinerU_CPU\lib\site-packages\magic_pdf\pipe\UNIPipe.py", line 31, in pipe_analyze
self.model_list = doc_analyze(self.pdf_bytes, ocr=True)
File "C:\Users\tu_ha\.conda\envs\MinerU_CPU\lib\site-packages\magic_pdf\model\doc_analyze_by_custom_model.py", line 69, in doc_analyze
custom_model = CustomPEKModel(ocr=ocr, show_log=show_log, models_dir=local_models_dir, device=device)
File "C:\Users\tu_ha\.conda\envs\MinerU_CPU\lib\site-packages\magic_pdf\model\pdf_extract_kit.py", line 111, in __init__
self.mfr_model, mfr_vis_processors = mfr_model_init(mfr_weight_dir, mfr_cfg_path, _device_=self.device)
File "C:\Users\tu_ha\.conda\envs\MinerU_CPU\lib\site-packages\magic_pdf\model\pdf_extract_kit.py", line 41, in mfr_model_init
model = model.to(_device_)
File "C:\Users\tu_ha\.conda\envs\MinerU_CPU\lib\site-packages\torch\nn\modules\module.py", line 1173, in to
return self._apply(convert)
File "C:\Users\tu_ha\.conda\envs\MinerU_CPU\lib\site-packages\torch\nn\modules\module.py", line 779, in _apply
module._apply(fn)
File "C:\Users\tu_ha\.conda\envs\MinerU_CPU\lib\site-packages\torch\nn\modules\module.py", line 779, in _apply
module._apply(fn)
File "C:\Users\tu_ha\.conda\envs\MinerU_CPU\lib\site-packages\torch\nn\modules\module.py", line 779, in _apply
module._apply(fn)
[Previous line repeated 3 more times]
File "C:\Users\tu_ha\.conda\envs\MinerU_CPU\lib\site-packages\torch\nn\modules\module.py", line 804, in _apply
param_applied = fn(param)
File "C:\Users\tu_ha\.conda\envs\MinerU_CPU\lib\site-packages\torch\nn\modules\module.py", line 1159, in convert
return t.to(
File "C:\Users\tu_ha\.conda\envs\MinerU_CPU\lib\site-packages\torch\cuda\__init__.py", line 284, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
After modifying the version, torch is available, but cuda is not available
(MinerU_GPU) C:\Users\tu_ha>python
Python 3.10.14 | packaged by Anaconda, Inc. | (main, May 6 2024, 19:44:50) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.cuda.is_available())
False
>>> print(torch.cuda.device_count())
0
>>> print(torch.version.cuda)
None
torch 2.3.1 is our support latest version. please use
pip install torch==2.3.1 torchvision==0.18.1 --index-url https://download.pytorch.org/whl/cu118
install torch with cuda.
Is it necessary to downgrade CUDA to 118?
Is it necessary to downgrade CUDA to 118?
If you want to use cuda accelerate both pytorch and paddlepaddle,cu11.8 is the only choice on windows. I test on Ubuntu22.04 use torch with cu12 and paddlepaddle with cu11 work well,but on windows they must use same version of cuda.
After I adjusted to the following dependencies, the GPU was available, and its efficiency was much higher than that of the CPU.
torch 2.3.1+cu118
torchtext 0.18.0
torchvision 0.18.1+cu118
It took me six hours one night to complete conda clone, conda and pip install, 2.3.1+cpu
(the pitfall of torch for CPU), and I experienced all the pitfalls of cuda version dependencies.
The conclusion is that the dependencies must be installed in accordance with the requirements of the readme.
Thank you for replying so late.
Description of the bug | 错误描述
Yesterday, the deployment of the CPU was completed. Today, an attempt was made to deploy the GPU, but some problems were encountered.
How to reproduce the bug | 如何复现
I cloned the conda environment of MinerU and then ran PyTorch corresponding to 12.4
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124
CUDA
Configuration file
Operation error reporting
Operating system | 操作系统
Windows
Python version | Python 版本
3.10
Software version | 软件版本 (magic-pdf --version)
0.6.x
Device mode | 设备模式
cuda