High latency exporting to hocr

Hello,

Coming from a Google Support Case 51622001: High latency exporting to hocr, which has derived to this issue.

Details

In the method from google.cloud.documentai_toolbox import document as documentai_document_wrapper

documentai_document_wrapper.Document.from_documentai_document(
        documentai_document=result.document
    ).export_hocr_str(title="title")

When transforming tables, the latency takes from 30 to 50 seconds, depending on the complexity of the page (high data in table formats).

Looking for any type of optimization.

Environment details

OS type and version: GCP cloudshell
google-cloud-documentai-toolbox version: 0.13.3a0

Steps to reproduce

create venv with the provided requirements.txt
execute python3 main-hocr.py test.pdf

Code example

 request = documentai.ProcessRequest(
      name=resource_name,
      raw_document=raw_document,
      process_options=process_options,
  )

  start = time.time()
  result = client.process_document(request=request)
  print(f"process_document {(time.time() - start)}")

  start = time.time()
  wrapped_document = documentai_document_wrapper.Document.from_documentai_document(
      documentai_document=result.document
  )
  print(f"wrapped_document {(time.time() - start)}")

  start = time.time()
  hocr_result = wrapped_document.export_hocr_str(title="hocr")
  print(f"export_hocr_str {(time.time() - start)}")

Stack trace

N/A, the execution is correct, but the latency takes 35 seconds long

Attached sources to reply the test: sources.zip

main-hocr.py, with the full code of the example
requirements.txt
test.pdf, file to process with documentai: ocr plus hocr

Thanks!

googleapis / python-documentai-toolbox