Closed longmangpang closed 1 month ago
update PDF-Extract-Kit-main/pdf_extract_kit/tasks/layout_detection/models/layoutlmv3.py `import os import cv2 import numpy as np from PIL import Image
from pdf_extract_kit.registry.registry import MODEL_REGISTRY from pdf_extract_kit.utils.visualization import visualize_bbox
from .layoutlmv3_util.model_init import Layoutlmv3_Predictor
@MODEL_REGISTRY.register("layout_detection_layoutlmv3") class LayoutDetectionLayoutlmv3: def init(self, config): """ Initialize the LayoutDetectionYOLO class.
Args:
config (dict): Configuration dictionary containing model parameters.
"""
# Mapping from class IDs to class names
self.id_to_names = {
0: 'title',
1: 'plain text',
2: 'abandon',
3: 'figure',
4: 'figure_caption',
5: 'table',
6: 'table_caption',
7: 'table_footnote',
8: 'isolate_formula',
9: 'formula_caption'
}
self.model = Layoutlmv3_Predictor(config.get('model_path', None))
self.visualize = config.get('visualize', True)
def predict(self, images, result_path, image_ids=None):
"""
Predict layouts in images.
Args:
images (list): List of images to be predicted. This list can contain file paths or PIL.Image.Image objects.
result_path (str): Path to save the prediction results.
image_ids (list, optional): List of image IDs corresponding to the images.
Returns:
list: List of prediction results.
"""
if not os.path.exists(result_path):
os.makedirs(result_path)
results = []
# 确保 image_ids 不为空,如果为空则使用 images 中的索引作为 ID
image_ids = image_ids if image_ids else list(range(len(images)))
for im_file, image_id in zip(images, image_ids):
# 如果 im_file 是 PIL.Image.Image 对象,则直接使用
if isinstance(im_file, Image.Image):
im = im_file.convert("RGB")
else:
# 如果 im_file 是文件路径,则打开图像
im = Image.open(im_file).convert("RGB")
# 将 PIL 图像转换为 NumPy 数组
im_array = np.array(im)
# 进行模型预测
layout_res = self.model(im_array, ignore_catids=[])
polys = [det["poly"] for det in layout_res["layout_dets"] if "poly" in det]
# 检查 polys 是否为空
if polys:
poly = np.array(polys)
# 确保 poly 是二维数组
if poly.ndim == 1:
poly = poly.reshape(1, -1) # 将一维数组转换为二维数组
boxes = poly[:, [0,1,4,5]]
scores = np.array([det["score"] for det in layout_res["layout_dets"]])
classes = np.array([det["category_id"] for det in layout_res["layout_dets"]])
if self.visualize:
# 可视化结果
vis_result = visualize_bbox(im_file, boxes, classes, scores, self.id_to_names)
# 确保 image_id 是字符串
if not isinstance(image_id, str):
image_id = str(image_id)
result_name = f"{image_id}_layout.png"
# 保存可视化结果
cv2.imwrite(os.path.join(result_path, result_name), vis_result)
# 追加结果
results.append({
"im_path": image_id, # 使用 image_id 作为路径标识
"boxes": boxes,
"scores": scores,
"classes": classes,
})
else:
print(f"No polygons found for image {image_id}")
return results`
Hello! It seems some problem about input file, could you please specify and provide more details about the input?
Hello! It seems some problem about input file, could you please specify and provide more details about the input?
The file I entered is a PDF file, and there is no problem in other models except for the layoutlmv3 model. Therefore, I modified the code, and I am not sure if there are any other solutions
Thank you, could you please provide your runing script, I will lookup what is wrong.
Thank you, could you please provide your runing script, I will lookup what is wrong.
I refer to the operations on this webpage:https://pdf-extract-kit.readthedocs.io/zh-cn/latest/algorithm/layout_detection.html python scripts/layout_detection.py --config configs/layout_detection_layoutlmv3.yaml
Thank you, could you please provide your runing script, I will lookup what is wrong.
I refer to the operations on this webpage:https://pdf-extract-kit.readthedocs.io/zh-cn/latest/algorithm/layout_detection.html python scripts/layout_detection.py --config configs/layout_detection_layoutlmv3.yaml
@longmangpang Hello! This bug has been fixed by this https://github.com/opendatalab/PDF-Extract-Kit/pull/160, chould you please try again?
Thank you, could you please provide your runing script, I will lookup what is wrong.
I refer to the operations on this webpage:https://pdf-extract-kit.readthedocs.io/zh-cn/latest/algorithm/layout_detection.html python scripts/layout_detection.py --config configs/layout_detection_layoutlmv3.yaml
@longmangpang Hello! This bug has been fixed by this #160, chould you please try again?
Thanks ,I didn't download the whl file for Detectron2 before. I compiled it myself. Is this the reason for the error?
Thank you, could you please provide your runing script, I will lookup what is wrong.
I refer to the operations on this webpage:https://pdf-extract-kit.readthedocs.io/zh-cn/latest/algorithm/layout_detection.html python scripts/layout_detection.py --config configs/layout_detection_layoutlmv3.yaml
@longmangpang Hello! This bug has been fixed by this #160, chould you please try again?
Thanks ,I didn't download the whl file for Detectron2 before. I compiled it myself. Is this the reason for the error?
No, this a bug is because of not considering input PIL.Image.Image
leading to PIL.Image.Image
object passed to Image.open
File "/data/PDF-Extract-Kit-main/scripts/layout_detection.py", line 41, in
main(args.config)
File "/data/PDF-Extract-Kit-main/scripts/layout_detection.py", line 33, in main
detection_results = model_layout_detection.predict_pdfs(input_data, result_path)
File "/data/PDF-Extract-Kit-main/scripts/../pdf_extract_kit/tasks/layout_detection/task.py", line 39, in predict_pdfs
return self.model.predict(list(pdf_images.values()), result_path, list(pdf_images.keys()))
File "/data/PDF-Extract-Kit-main/scripts/../pdf_extract_kit/tasks/layout_detection/models/layoutlmv3.py", line 53, in predict
im = Image.open(im_file).convert("RGB")
File "/data/miniconda/envs/pdf/lib/python3.10/site-packages/PIL/Image.py", line 3480, in open
prefix = fp.read(16)
AttributeError: 'Image' object has no attribute 'read'