Open ggservice007 opened 8 months ago
@ggservice007 please show the example code for this
dependencies
unstructured==0.12.0
unstructured-inference==0.7.21
unstructured.pytesseract==0.3.12
pdf2image==1.17.0
pdfminer.six==20231228
pikepdf==8.13.0
apt-get install poppler-utils
example
from unstructured.partition.pdf import partition_pdf
partition_pdf(
filename=file_path,
strategy="hi_res",
extract_images_in_pdf=True,
extract_image_block_output_dir=output_dir
)
unstructured https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/partition/pdf.py#L136
下载的镜像默认放在/root/.cache/huggingface/hub
路径下,可通过设置环境变量HF_HUB_CACHE
的值更改路径
@ggservice007 @bjwswang
使用unstructured时,面对大图片会报错
PIL.Image.DecompressionBombError: Image size (8284731418 pixels) exceeds limit of 178956970 pixels, could be decompression bomb DOS attack.
下载的镜像默认放在
/root/.cache/huggingface/hub
路径下,可通过设置环境变量HF_HUB_CACHE
的值更改路径 @ggservice007 @bjwswang
能不能直接设置路径?
what
Implement the function of load the model yolox_l0.05.onnx in local disk by reading the unstructured source code.
why
Now unstructured try to load the unstructuredio/yolo_x_layout/yolox_l0.05.onnx by downloading from huggingface if it can not found.