opendatalab / PDF-Extract-Kit

A Comprehensive Toolkit for High-Quality PDF Content Extraction
Apache License 2.0
4.57k stars 302 forks source link

Cannot run on Mac M-chip #38

Open bookandlover opened 1 month ago

bookandlover commented 1 month ago

Errors as the following:

(.venv) (base) pengxiong@PENGMacPro PDF-Extract-Kit % python pdf_extract.py --pdf demo/demo1.pdf [2024-07-19 20:17:51,713] [ ERROR] check_version.py:39 - Error fetching version info Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 1348, in do_open h.request(req.get_method(), req.selector, req.data, headers, File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/http/client.py", line 1283, in request self._send_request(method, url, body, headers, encode_chunked) File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/http/client.py", line 1329, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/http/client.py", line 1278, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/http/client.py", line 1038, in _send_output self.send(msg) File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/http/client.py", line 976, in send self.connect() File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/http/client.py", line 1455, in connect self.sock = self._context.wrap_socket(self.sock, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ssl.py", line 517, in wrap_socket return self.sslsocket_class._create( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ssl.py", line 1075, in _create self.do_handshake() File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ssl.py", line 1346, in do_handshake self._sslobj.do_handshake() ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1002)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/Users/pengxiong/LLM/PDF-Extract-Kit/.venv/lib/python3.11/site-packages/albumentations/check_version.py", line 29, in fetch_version_info with opener.open(url, timeout=2) as response: ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 519, in open response = self._open(req, data) ^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 536, in _open result = self._call_chain(self.handle_open, protocol, protocol + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 496, in _call_chain result = func(args) ^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 1391, in https_open return self.do_open(http.client.HTTPSConnection, req, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 1351, in do_open raise URLError(err) urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1002)> Traceback (most recent call last): File "/Users/pengxiong/LLM/PDF-Extract-Kit/pdf_extract.py", line 18, in from unimernet.common.config import Config File "/Users/pengxiong/LLM/PDF-Extract-Kit/.venv/lib/python3.11/site-packages/unimernet/init.py", line 18, in from unimernet.tasks import File "/Users/pengxiong/LLM/PDF-Extract-Kit/.venv/lib/python3.11/site-packages/unimernet/tasks/init.py", line 10, in from unimernet.tasks.unimernet_train import UniMERNet_Train File "/Users/pengxiong/LLM/PDF-Extract-Kit/.venv/lib/python3.11/site-packages/unimernet/tasks/unimernet_train.py", line 11, in from torchtext.data import metrics File "/Users/pengxiong/LLM/PDF-Extract-Kit/.venv/lib/python3.11/site-packages/torchtext/init.py", line 18, in from torchtext import _extension # noqa: F401 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/pengxiong/LLM/PDF-Extract-Kit/.venv/lib/python3.11/site-packages/torchtext/_extension.py", line 64, in _init_extension() File "/Users/pengxiong/LLM/PDF-Extract-Kit/.venv/lib/python3.11/site-packages/torchtext/_extension.py", line 58, in _init_extension _load_lib("libtorchtext") File "/Users/pengxiong/LLM/PDF-Extract-Kit/.venv/lib/python3.11/site-packages/torchtext/_extension.py", line 50, in _load_lib torch.ops.load_library(path) File "/Users/pengxiong/LLM/PDF-Extract-Kit/.venv/lib/python3.11/site-packages/torch/_ops.py", line 1354, in load_library ctypes.CDLL(path) File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ctypes/init.py", line 376, in init self._handle = _dlopen(self._name, mode) ^^^^^^^^^^^^^^^^^^^^^^^^^ OSError: dlopen(/Users/pengxiong/LLM/PDF-Extract-Kit/.venv/lib/python3.11/site-packages/torchtext/lib/libtorchtext.so, 0x0006): Symbol not found: ZN3c105ErrorC1ENSt3112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEES7_PKv Referenced from: <5436ECC1-6F45-386E-B542-D5F76A22B52C> /Users/pengxiong/LLM/PDF-Extract-Kit/.venv/lib/python3.11/site-packages/torchtext/lib/libtorchtext.so Expected in: /Users/pengxiong/LLM/PDF-Extract-Kit/.venv/lib/python3.11/site-packages/torch/lib/libc10.dylib

vonlaughing commented 1 month ago

I encountered the same problem, this was after I performed the following steps:

image image
myhloli commented 1 month ago

I encountered the same problem, this was after I performed the following steps:

image image

don't mind this error,you can try to run it.

vonlaughing commented 1 month ago

I encountered the same problem, this was after I performed the following steps:

image image

don't mind this error,you can try to run it.

When I run it I'm getting the following error:

image

Can you help me? Thank you!

myhloli commented 1 month ago

OSError: dlopen(/Users/pengxiong/LLM/PDF-Extract-Kit/.venv/lib/python3.11/site-packages/torchtext/lib/libtorchtext.so, 0x0006): Symbol not found: ZN3c105ErrorC1ENSt3112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEES7_PKv

TorchText development is stopped and the 0.18 release (April 2024) will be the last stable release of the library.

The versions of torch, torchvision, and torchtext need to be compatible. but torchtext 0.18.0 not compatibility of torch 2.5.0 maybe you can

pip uninstall torch torchvision torchtext
pip install --pre torch torchvision torchtext --index-url https://download.pytorch.org/whl/nightly/cpu

to try use the nightly build to open mps, or

pip uninstall torch torchvision torchtext
pip install torch torchvision torchtext 

you can try using mps for this, if that's not possible, the cpu will work as well.

vonlaughing commented 1 month ago

Thank you! However this gets me here, I'm just gonna try the cpu version, thank you!

image
myhloli commented 1 month ago

Thank you! However this gets me here, I'm just gonna try the cpu version, thank you!

image

As a temporary fix, you can set the environment var iable 'PYTORCH_ENABLE_MPS_FALLBACK=1' to use the CPU as a fallback for this op.

bookandlover commented 1 month ago
image

Solved after many tries. The problem comes from certifi update. Refer: 这个错误是由于权限不足导致无法更新 certifi 包。你需要使用管理员权限来运行该命令。可以尝试以下步骤来解决这个问题:

1. 使用 sudo 命令

在终端中运行以下命令,使用 sudo 提升权限来安装 certifi

sudo /Library/Frameworks/Python.framework/Versions/3.11/bin/python3.11 -m pip install --upgrade certifi

系统会提示你输入管理员密码,输入密码后继续安装。

2. 更新 pip

更新 pip 版本可能会有助于解决问题。使用以下命令来更新 pip

sudo /Library/Frameworks/Python.framework/Versions/3.11/bin/python3.11 -m pip install --upgrade pip

然后再尝试升级 certifi

sudo /Library/Frameworks/Python.framework/Versions/3.11/bin/python3.11 -m pip install --upgrade certifi

3. 验证安装

成功安装 certifi 后,运行以下命令验证安装是否成功:

/Library/Frameworks/Python.framework/Versions/3.11/bin/python3.11 -m pip show certifi

4. 重新运行证书安装脚本

再次运行证书安装脚本:

/Applications/Python\ 3.11/Install\ Certificates.command

如果上述步骤完成后没有错误,请重新运行你的 pdf_extract.py 脚本:

python pdf_extract.py --pdf demo/demo1.pdf

这样应该能够解决证书和权限相关的问题。如果仍有其他问题,请告诉我详细信息。

myhloli commented 1 month ago

@bookandlover Thank you for your feedback. We will update the document so that other users can solve similar problems.

bookandlover commented 1 month ago

在PDF文件提取中遇到无法启动MPS的问题,需要强制退回到CPU执行。很可能的原因是特定版本的 PyTorch 和 detectron2 之间存在兼容性问题,尝试降级或升级它们以解决问题。能否给出正确的版本号呢?我是M2 ULTRA的芯片。下面是一段示意代码。非常感谢。

pip install torch==1.12.1 torchvision==0.13.1 pip install detectron2==0.6 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu113/torch1.12/index.html

myhloli commented 1 month ago

在PDF文件提取中遇到无法启动MPS的问题,需要强制退回到CPU执行。很可能的原因是特定版本的 PyTorch 和 detectron2 之间存在兼容性问题,尝试降级或升级它们以解决问题。能否给出正确的版本号呢?我是M2 ULTRA的芯片。下面是一段示意代码。非常感谢。

pip install torch==1.12.1 torchvision==0.13.1

pip install detectron2==0.6 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu113/torch1.12/index.html

pip install torch==2.3.1 torchvision==0.18.1 torchtext==0.18.0

pip install detectron2 --extra-index-url https://myhloli.github.io/wheels/

for python 3.10 it works good

dennicLiu commented 1 month ago

Error: command buffer exited with error status. The Metal Performance Shaders operations encoded on it may not have completed. Error: (null) Internal Error (0000000e:Internal Error) <AGXG14XFamilyCommandBuffer: 0x37ba4f2c0> label = device = <AGXG14SDevice: 0x30888a600> name = Apple M2 Pro commandQueue = <AGXG14XFamilyCommandQueue: 0x32128f200> label = device = <AGXG14SDevice: 0x30888a600> name = Apple M2 Pro retainedReferences = 1 Error: command buffer exited with error status. The Metal Performance Shaders operations encoded on it may not have completed. Error: (null) Internal Error (0000000e:Internal Error) <AGXG14XFamilyCommandBuffer: 0x37ba4ed50> label = device = <AGXG14SDevice: 0x30888a600> name = Apple M2 Pro commandQueue = <AGXG14XFamilyCommandQueue: 0x32128f200> label = device = <AGXG14SDevice: 0x30888a600> name = Apple M2 Pro retainedReferences = 1 Error: command buffer exited with error status. The Metal Performance Shaders operations encoded on it may not have completed. Error: (null) Internal Error (0000000e:Internal Error) <AGXG14XFamilyCommandBuffer: 0x37ba50af0> label = device = <AGXG14SDevice: 0x30888a600> name = Apple M2 Pro commandQueue = <AGXG14XFamilyCommandQueue: 0x32128f200> label = device = <AGXG14SDevice: 0x30888a600> name = Apple M2 Pro retainedReferences = 1 Error: command buffer exited with error status. The Metal Performance Shaders operations encoded on it may not have completed. Error: (null) Internal Error (0000000e:Internal Error) <AGXG14XFamilyCommandBuffer: 0x37ba4c5d0> label = device = <AGXG14SDevice: 0x30888a600> name = Apple M2 Pro commandQueue = <AGXG14XFamilyCommandQueue: 0x32128f200> label = device = <AGXG14SDevice: 0x30888a600> name = Apple M2 Pro retainedReferences = 1 Traceback (most recent call last): File "/Users/liuchuang/Desktop/engram/PDF-Extract-Kit/pdf_extract.py", line 123, in layout_res = layout_model(image, ignore_catids=[]) File "/Users/liuchuang/Desktop/engram/PDF-Extract-Kit/modules/layoutlmv3/model_init.py", line 124, in call outputs = self.predictor(image) File "/opt/homebrew/Caskroom/miniconda/base/envs/test1/lib/python3.10/site-packages/detectron2/engine/defaults.py", line 319, in call predictions = self.model([inputs])[0] File "/opt/homebrew/Caskroom/miniconda/base/envs/test1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/opt/homebrew/Caskroom/miniconda/base/envs/test1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, *kwargs) File "/Users/liuchuang/Desktop/engram/PDF-Extract-Kit/modules/layoutlmv3/rcnn_vl.py", line 55, in forward return self.inference(batched_inputs) File "/Users/liuchuang/Desktop/engram/PDF-Extract-Kit/modules/layoutlmv3/rcnnvl.py", line 122, in inference results, = self.roi_heads(images, features, proposals, None) File "/opt/homebrew/Caskroom/miniconda/base/envs/test1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/opt/homebrew/Caskroom/miniconda/base/envs/test1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, kwargs) File "/opt/homebrew/Caskroom/miniconda/base/envs/test1/lib/python3.10/site-packages/detectron2/modeling/roi_heads/cascade_rcnn.py", line 150, in forward pred_instances = self.forward_with_given_boxes(features, pred_instances) File "/opt/homebrew/Caskroom/miniconda/base/envs/test1/lib/python3.10/site-packages/detectron2/modeling/roi_heads/roi_heads.py", line 776, in forward_with_given_boxes instances = self._forward_mask(features, instances) File "/opt/homebrew/Caskroom/miniconda/base/envs/test1/lib/python3.10/site-packages/detectron2/modeling/roi_heads/roi_heads.py", line 843, in _forward_mask features = self.mask_pooler(features, boxes) File "/opt/homebrew/Caskroom/miniconda/base/envs/test1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/opt/homebrew/Caskroom/miniconda/base/envs/test1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(args, kwargs) File "/opt/homebrew/Caskroom/miniconda/base/envs/test1/lib/python3.10/site-packages/detectron2/modeling/poolers.py", line 243, in forward pooler_fmt_boxes = convert_boxes_to_pooler_format(box_lists) File "/opt/homebrew/Caskroom/miniconda/base/envs/test1/lib/python3.10/site-packages/detectron2/modeling/poolers.py", line 98, in convert_boxes_to_pooler_format return _convert_boxes_to_pooler_format(boxes, sizes) File "/opt/homebrew/Caskroom/miniconda/base/envs/test1/lib/python3.10/site-packages/torch/jit/_trace.py", line 1254, in wrapper return fn(*args, **kwargs) File "/opt/homebrew/Caskroom/miniconda/base/envs/test1/lib/python3.10/site-packages/detectron2/modeling/poolers.py", line 66, in _convert_boxes_to_pooler_format indices = torch.repeat_interleave( RuntimeError: Expected repeatBuffer && cumsumBuffer && resultBuffer to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)

PyTorch version 2.3.1 detectron2 version 0.6 用cpu 可以正常运行 mps 无法运行