googleapis / python-documentai-toolbox

Document AI Toolbox is an SDK for Python that provides utility functions for managing, manipulating, and extracting information from the document response. It creates a "wrapped" document object from JSON files in Cloud Storage, local JSON files, or output directly from the Document AI API.
https://cloud.google.com/document-ai/docs/toolbox
Apache License 2.0
33 stars 13 forks source link

fix: Refactor page.py to improve performance and organization #316

Closed holtskinner closed 1 month ago

holtskinner commented 1 month ago

Fixes #312 🦕

Improves upon the hOCR processing improvements made in #313

         22788704 function calls (19761621 primitive calls) in 6.297 seconds
dizcology commented 1 month ago

Please also say a few words (preferably as code comments) about what caused the slow performance in the original implementation, and which specific changes fixed it.