kubeagi / core-library

Core library for kubeagi to provide apis&sdk in python
Apache License 2.0
3 stars 3 forks source link

Implement the function of load the model yolox_l0.05.onnx in local disk by reading the unstructured source code. #41

Open ggservice007 opened 8 months ago

ggservice007 commented 8 months ago

what

Implement the function of load the model yolox_l0.05.onnx in local disk by reading the unstructured source code.

why

Now unstructured try to load the unstructuredio/yolo_x_layout/yolox_l0.05.onnx by downloading from huggingface if it can not found.

bjwswang commented 8 months ago

@ggservice007 please show the example code for this

wangxinbiao commented 8 months ago

dependencies

unstructured==0.12.0
unstructured-inference==0.7.21
unstructured.pytesseract==0.3.12
pdf2image==1.17.0
pdfminer.six==20231228
pikepdf==8.13.0

apt-get install poppler-utils

example

from unstructured.partition.pdf import partition_pdf

partition_pdf(
    filename=file_path,
    strategy="hi_res",
    extract_images_in_pdf=True,
    extract_image_block_output_dir=output_dir
)

unstructured https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/partition/pdf.py#L136

wangxinbiao commented 8 months ago

下载的镜像默认放在/root/.cache/huggingface/hub路径下,可通过设置环境变量HF_HUB_CACHE的值更改路径 @ggservice007 @bjwswang

wangxinbiao commented 8 months ago

使用unstructured时,面对大图片会报错

PIL.Image.DecompressionBombError: Image size (8284731418 pixels) exceeds limit of 178956970 pixels, could be decompression bomb DOS attack.
bjwswang commented 8 months ago

下载的镜像默认放在/root/.cache/huggingface/hub路径下,可通过设置环境变量HF_HUB_CACHE的值更改路径 @ggservice007 @bjwswang

能不能直接设置路径?