opendatalab / PDF-Extract-Kit

A Comprehensive Toolkit for High-Quality PDF Content Extraction
https://pdf-extract-kit.readthedocs.io/zh-cn/latest/index.html
GNU Affero General Public License v3.0
5.94k stars 388 forks source link

layoutlmv3 appear AttributeError: 'Image' object has no attribute 'read' #157

Closed longmangpang closed 1 month ago

longmangpang commented 1 month ago

File "/data/PDF-Extract-Kit-main/scripts/layout_detection.py", line 41, in main(args.config) File "/data/PDF-Extract-Kit-main/scripts/layout_detection.py", line 33, in main detection_results = model_layout_detection.predict_pdfs(input_data, result_path) File "/data/PDF-Extract-Kit-main/scripts/../pdf_extract_kit/tasks/layout_detection/task.py", line 39, in predict_pdfs return self.model.predict(list(pdf_images.values()), result_path, list(pdf_images.keys())) File "/data/PDF-Extract-Kit-main/scripts/../pdf_extract_kit/tasks/layout_detection/models/layoutlmv3.py", line 53, in predict im = Image.open(im_file).convert("RGB") File "/data/miniconda/envs/pdf/lib/python3.10/site-packages/PIL/Image.py", line 3480, in open prefix = fp.read(16) AttributeError: 'Image' object has no attribute 'read'

longmangpang commented 1 month ago

update PDF-Extract-Kit-main/pdf_extract_kit/tasks/layout_detection/models/layoutlmv3.py `import os import cv2 import numpy as np from PIL import Image

from pdf_extract_kit.registry.registry import MODEL_REGISTRY from pdf_extract_kit.utils.visualization import visualize_bbox

from .layoutlmv3_util.model_init import Layoutlmv3_Predictor

@MODEL_REGISTRY.register("layout_detection_layoutlmv3") class LayoutDetectionLayoutlmv3: def init(self, config): """ Initialize the LayoutDetectionYOLO class.

    Args:
        config (dict): Configuration dictionary containing model parameters.
    """
    # Mapping from class IDs to class names
    self.id_to_names = {
        0: 'title', 
        1: 'plain text',
        2: 'abandon', 
        3: 'figure', 
        4: 'figure_caption', 
        5: 'table', 
        6: 'table_caption', 
        7: 'table_footnote', 
        8: 'isolate_formula', 
        9: 'formula_caption'
    }
    self.model = Layoutlmv3_Predictor(config.get('model_path', None))
    self.visualize = config.get('visualize', True)

def predict(self, images, result_path, image_ids=None):
    """
    Predict layouts in images.

    Args:
        images (list): List of images to be predicted. This list can contain file paths or PIL.Image.Image objects.
        result_path (str): Path to save the prediction results.
        image_ids (list, optional): List of image IDs corresponding to the images.

    Returns:
        list: List of prediction results.
    """
    if not os.path.exists(result_path):
        os.makedirs(result_path)

    results = []
    # 确保 image_ids 不为空,如果为空则使用 images 中的索引作为 ID
    image_ids = image_ids if image_ids else list(range(len(images)))

    for im_file, image_id in zip(images, image_ids):
        # 如果 im_file 是 PIL.Image.Image 对象,则直接使用
        if isinstance(im_file, Image.Image):
            im = im_file.convert("RGB")
        else:
            # 如果 im_file 是文件路径,则打开图像
            im = Image.open(im_file).convert("RGB")

        # 将 PIL 图像转换为 NumPy 数组
        im_array = np.array(im)

        # 进行模型预测
        layout_res = self.model(im_array, ignore_catids=[])
        polys = [det["poly"] for det in layout_res["layout_dets"] if "poly" in det]

        # 检查 polys 是否为空
        if polys:
            poly = np.array(polys)
            # 确保 poly 是二维数组
            if poly.ndim == 1:
                poly = poly.reshape(1, -1)  # 将一维数组转换为二维数组

            boxes = poly[:, [0,1,4,5]]
            scores = np.array([det["score"] for det in layout_res["layout_dets"]])
            classes = np.array([det["category_id"] for det in layout_res["layout_dets"]])

            if self.visualize:
                # 可视化结果
                vis_result = visualize_bbox(im_file, boxes, classes, scores, self.id_to_names)
                # 确保 image_id 是字符串
                if not isinstance(image_id, str):
                    image_id = str(image_id)
                result_name = f"{image_id}_layout.png"
                # 保存可视化结果
                cv2.imwrite(os.path.join(result_path, result_name), vis_result)

            # 追加结果
            results.append({
                "im_path": image_id,  # 使用 image_id 作为路径标识
                "boxes": boxes,
                "scores": scores,
                "classes": classes,
            })
        else:
            print(f"No polygons found for image {image_id}")
    return results`
JulioZhao97 commented 1 month ago

Hello! It seems some problem about input file, could you please specify and provide more details about the input?

longmangpang commented 1 month ago

Hello! It seems some problem about input file, could you please specify and provide more details about the input?

The file I entered is a PDF file, and there is no problem in other models except for the layoutlmv3 model. Therefore, I modified the code, and I am not sure if there are any other solutions

JulioZhao97 commented 1 month ago

Thank you, could you please provide your runing script, I will lookup what is wrong.

longmangpang commented 1 month ago

Thank you, could you please provide your runing script, I will lookup what is wrong.

I refer to the operations on this webpage:https://pdf-extract-kit.readthedocs.io/zh-cn/latest/algorithm/layout_detection.html python scripts/layout_detection.py --config configs/layout_detection_layoutlmv3.yaml

JulioZhao97 commented 1 month ago

Thank you, could you please provide your runing script, I will lookup what is wrong.

I refer to the operations on this webpage:https://pdf-extract-kit.readthedocs.io/zh-cn/latest/algorithm/layout_detection.html python scripts/layout_detection.py --config configs/layout_detection_layoutlmv3.yaml

@longmangpang Hello! This bug has been fixed by this https://github.com/opendatalab/PDF-Extract-Kit/pull/160, chould you please try again?

longmangpang commented 1 month ago

Thank you, could you please provide your runing script, I will lookup what is wrong.

I refer to the operations on this webpage:https://pdf-extract-kit.readthedocs.io/zh-cn/latest/algorithm/layout_detection.html python scripts/layout_detection.py --config configs/layout_detection_layoutlmv3.yaml

@longmangpang Hello! This bug has been fixed by this #160, chould you please try again?

Thanks ,I didn't download the whl file for Detectron2 before. I compiled it myself. Is this the reason for the error?

JulioZhao97 commented 1 month ago

Thank you, could you please provide your runing script, I will lookup what is wrong.

I refer to the operations on this webpage:https://pdf-extract-kit.readthedocs.io/zh-cn/latest/algorithm/layout_detection.html python scripts/layout_detection.py --config configs/layout_detection_layoutlmv3.yaml

@longmangpang Hello! This bug has been fixed by this #160, chould you please try again?

Thanks ,I didn't download the whl file for Detectron2 before. I compiled it myself. Is this the reason for the error?

No, this a bug is because of not considering input PIL.Image.Image leading to PIL.Image.Image object passed to Image.open