googleapis / python-documentai-toolbox

Document AI Toolbox is an SDK for Python that provides utility functions for managing, manipulating, and extracting information from the document response. It creates a "wrapped" document object from JSON files in Cloud Storage, local JSON files, or output directly from the Document AI API.
https://cloud.google.com/document-ai/docs/toolbox
Apache License 2.0
33 stars 13 forks source link

Apply Transforms Field to Bounding Boxes #295

Open zkalson opened 4 months ago

zkalson commented 4 months ago

I understand that Document AI performs preprocessing to correct for issues like skew in documents sent to the API, and that any transformations applied are provided in the response under the transforms field. It would be massively helpful to have a function I can call on the Document object to the undo the preprocessing so that the bounding boxes are relative to the submitted document, not the preprocessed document.

I've spent a few weeks trying to implement this myself (and have a support request that's been in limbo for about a month and a half), and unfortunately haven't been able to make any progress. Admittedly, I have pretty limited experience with OpenCV, so I may be missing something.

Attached are images of a document I uploaded to GCP, the corresponding preprocessed image that GCP returns in the images field, and the output of the bounding boxes when I attempt to apply the transforms. If you look closely at the text layer, it doesn't match up with the original image.

original gcp_preprocessed_image output_text_layer