opendatalab / PDF-Extract-Kit

A Comprehensive Toolkit for High-Quality PDF Content Extraction
https://pdf-extract-kit.readthedocs.io/zh-cn/latest/index.html
GNU Affero General Public License v3.0
5.83k stars 384 forks source link

环境冲突 #153

Closed alexander2618 closed 1 month ago

alexander2618 commented 1 month ago

1、使用pip找不到doclayout-yolo==0.0.2,github上源码也仅有0.0.1 2、按照使用教程,运行了PDF内容提取,使用以下命令 python project/pdf2markdown/scripts/run_project.py --config project/pdf2markdown/configs/pdf2markdown.yaml 但是有一点冲突,以下报错,这个库应该是在doclayout-yolo中的

AttributeError: Can't get attribute 'YOLOv10DetectionModel' on <module 'ultralytics.nn.tasks' from '/opt/mamba/envs/pdf/lib/python3.10/site-packages/ultralytics/nn/tasks.py'>

JulioZhao97 commented 1 month ago

您好,

  1. 目前doclayout-yolo仅支持从pypi源安装,github源码很快将会更新到0.0.2,麻烦您试一下pip3 install doclayout_yolo==0.0.2 --extra-index-url=https://pypi.org/simple能否安装0.0.2的doclayout-yolo
  2. 能否看一下报错更多的traceback输出?这个问题貌似是ultralytics版本过低导致的(未支持yolov10)
alexander2618 commented 1 month ago

谢谢,1的问题已经解决了,2是在这里进入了下面PDB的调试状态 image trackback 如下

(pdf) [16:44:38] root:PDF-Extract-Kit git:(main*) # python project/pdf2markdown/scripts/run_project.py --config project/pdf2markdown/configs/pdf2markdown.yaml INFO:albumentations.check_version:A new version of Albumentations is available: 1.4.18 (you have 1.4.11). Upgrade using: pip install --upgrade albumentations INFO:datasets:PyTorch version 2.3.1 available. /opt/mamba/envs/pdf/lib/python3.10/site-packages/torchtext/data/init.py:4: UserWarning: /!\ IMPORTANT WARNING ABOUT TORCHTEXT STATUS /!\ Torchtext is deprecated and the last released version will be 0.18 (this one). You can silence this warning by calling the following at the beginnign of your scripts: import torchtext; torchtext.disable_torchtext_deprecation_warning() warnings.warn(torchtext._TORCHTEXT_DEPRECATION_MSG) /usr/data/code/PDF-Extract-Kit/pdf_extract_kit/tasks/layout_detection/models/yolo.py(39)init() -> from ultralytics import YOLO (Pdb)

JulioZhao97 commented 1 month ago

您好,这里存在一些问题没有resolve,稍后修改之后您可以重新pull一下再试一下,请稍等

alexander2618 commented 1 month ago

好的,麻烦了,谢谢

JulioZhao97 commented 1 month ago

您好,修改部署上去了,麻烦您再

  1. 重新pull一下main分支代码
  2. 安装doclayout-yolo==0.0.2通过pip3 install doclayout_yolo==0.0.2 --extra-index-url=https://pypi.org/simple 跑一下看看能否跑通呢?
alexander2618 commented 1 month ago

1、已经拉了,还有个小问题 https://github.com/opendatalab/PDF-Extract-Kit/blob/main/project/pdf2markdown/scripts/pdf2markdown.py#332 image pdf_path 应该是fpath,不然会报错。您可以再看一下 2、已解决 感谢

JulioZhao97 commented 1 month ago

1、已经拉了,还有个小问题 https://github.com/opendatalab/PDF-Extract-Kit/blob/main/project/pdf2markdown/scripts/pdf2markdown.py#332 image pdf_path 应该是fpath,不然会报错。您可以再看一下 2、已解决 感谢

收到,感谢反馈!有任何问题欢迎反馈

imzeroan commented 1 week ago

1、已经拉了,还有个小问题 https://github.com/opendatalab/PDF-Extract-Kit/blob/main/project/pdf2markdown/scripts/pdf2markdown.py#332 image pdf_path 应该是fpath,不然会报错。您可以再看一下 2、已解决 感谢

收到,感谢反馈!有任何问题欢迎反馈

pdf_path的bug好像还没有修复~