pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
https://pymupdf.readthedocs.io
GNU Affero General Public License v3.0
5.19k stars 496 forks source link

Inconsistency in Image-to-PDF Coordinate Alignment #3345

Closed satvik-27199 closed 5 months ago

satvik-27199 commented 6 months ago

Description of the bug

I have observed some inconsistencies in aligning image coordinates with the PDF coordinate system. When using an image's bounding box as input, the objective is to convert it into the PDF coordinate system. This conversion is crucial for accurately extracting word-level bounding boxes using PyMuPDF.

The inconsistency is specifically shown in 'page_48.pdf'. Additionally, an example '5.pdf' is provided, demonstrating that the output appears consistent.

The corresponding image files are also attached. 'page_48.jpeg' 'page_5.jpeg' image_bbox_48 pdf_visualization (2).pdf

5.pdf page_48.pdf

Image and PDF Folder

How to reproduce the bug

bbox = [127.1285171508789, 236.234619140625, 1547.159912109375, 673.4690551757812] ## For 48.pdf

bbox = [209.13824462890625, 447.0567321777344, 1482.3857421875, 953.8846435546875] ## For 5.pdf

image_path = '/content/page_5.jpeg' pdf_path = '/content/5.pdf'

import cv2 import matplotlib.pyplot as plt

image = cv2.imread(image_path)

if image is None: print(f"Failed to load image at {image_path}") else: x1, y1, x2, y2 = map(int, bbox) # Example coordinates cv2.rectangle(image, (x1, y1), (x2, y2), (0, 0, 255), 4) image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) plt.imshow(image_rgb) plt.axis('off')

def scale_coordinates(matrix, original_width, original_height, target_width, target_height): scaled_matrix = [] for coordinates in matrix: x1, y1, x2, y2 = coordinates

Scale coordinates

    new_x1 = x1 / original_width * target_width
    new_y1 = y1 / original_height * target_height
    new_x2 = x2 / original_width * target_width
    new_y2 = y2 / original_height * target_height
    # Append scaled coordinates to the new matrix
    scaled_matrix.append([new_x1, new_y1, new_x2, new_y2])
return scaled_matrix

doc = fitz.open(pdf_path) page = doc[0] pdf_width, pdf_height = page.rect.width, page.rect.height im = cv2.imread(image_path) img_height, img_width, channels = im.shape

original_width = img_width original_height = img_height target_width = pdf_width target_height = pdf_height

original_width, original_height, target_width, target_height

bbox_list = [bbox] bbox_apapted = scale_coordinates(bbox_list, original_width, original_height, target_width, target_height) bbox_apapted

import fitz

doc = fitz.open(pdf_path) page = doc[0]

rect = fitz.Rect(bbox_apapted[0][0], bbox_apapted[0][1], bbox_apapted[0][2], bbox_apapted[0][3]) color = (1, 0, 0) # Red page.draw_rect(rect, color=color, width=1.5, overlay=True)

output_pdf_path = "/content/pdf_visualization.pdf" doc.save(output_pdf_path) doc.close()

PyMuPDF version

1.24.1

Operating system

MacOS

Python version

3.10

JorjMcKie commented 6 months ago

Sorry - I am not able follow your code. PyMuPDF image insertion is exclusively responsible for inserting a given image in a given target rectangle of a PDF in such a way that

  1. the centers of the image rectangle and the target rectangle coincide, and
  2. the image is scaled inside the target rectangle such that at least one of its width or height coincide with the target rectangle's width, respectively height.

Among other things this means: If the aspect ratios of target rect and image do not exactly coincide, there will remain unused stripes in the target rectangle. If you later look at the bbox on the page covered by the image, it will in general not be the same as the original insert rectangle.

To demonstrate a bug, please provide me with a simple (!) example where these criteria are violated.

JorjMcKie commented 5 months ago

Closing this for lack of response. Please feel free to reopen with a reproducer case.