googleapis / python-documentai-toolbox

Document AI Toolbox is an SDK for Python that provides utility functions for managing, manipulating, and extracting information from the document response. It creates a "wrapped" document object from JSON files in Cloud Storage, local JSON files, or output directly from the Document AI API.
https://cloud.google.com/document-ai/docs/toolbox
Apache License 2.0
33 stars 16 forks source link

AttributeError: 'ClientInfo' object has no attribute 'to_grpc_metadata' #266

Closed gabrielboehme closed 9 months ago

gabrielboehme commented 9 months ago

Environment details

Steps to reproduce

  1. Requirements:

    cachetools==5.3.3
    certifi==2024.2.2
    charset-normalizer==3.3.2
    Deprecated==1.2.14
    google-api-core==2.17.1
    google-auth==2.28.1
    google-cloud-bigquery==3.17.2
    google-cloud-core==2.4.1
    google-cloud-documentai==2.24.0
    google-cloud-documentai-toolbox==0.13.0a0
    google-cloud-storage==2.14.0
    google-cloud-vision==3.7.1
    google-crc32c==1.5.0
    google-resumable-media==2.7.0
    googleapis-common-protos==1.62.0
    grpc-google-iam-v1==0.12.7
    grpcio==1.62.0
    grpcio-status==1.62.0
    idna==3.6
    immutabledict==3.0.0
    intervaltree==3.1.0
    Jinja2==3.1.3
    lxml==4.9.4
    MarkupSafe==2.1.5
    numpy==1.24.4
    packaging==23.2
    pandas==2.0.3
    pikepdf==8.13.0
    pillow==10.2.0
    proto-plus==1.23.0
    protobuf==4.25.3
    pyarrow==15.0.0
    pyasn1==0.5.1
    pyasn1-modules==0.3.0
    python-dateutil==2.9.0
    pytz==2024.1
    requests==2.31.0
    rsa==4.9
    six==1.16.0
    sortedcontainers==2.4.0
    tabulate==0.9.0
    tzdata==2024.1
    urllib3==2.2.1
    wrapt==1.16.0
  2. Execution:

  3. python3 main.py

Code example

main.py:


from google.cloud import documentai
from google.cloud.documentai_toolbox import document

wrapped_document = document.Document.from_batch_process_operation(
    operation_name=operation_name
    location=location
)

wrapped_document.entities_to_bigquery(
        dataset_name=dataset, table_name=table, project_id=project
)

Stack trace

  File "main.py", line 4, in <module>
    wrapped_document = document.Document.from_batch_process_operation(
  File "/<my_script_location>/venv/lib/python3.8/site-packages/google/cloud/documentai_toolbox/wrappers/document.py", line 600, in from_batch_process_operation
    metadata=_get_batch_process_metadata(
  File "/<my_script_location>/venv/lib/python3.8/site-packages/google/cloud/documentai_toolbox/wrappers/document.py", line 156, in _get_batch_process_metadata
    client = documentai.DocumentProcessorServiceClient(
  File "/<my_script_location>/venv/lib/python3.8/site-packages/google/cloud/documentai_v1/services/document_processor_service/client.py", line 775, in __init__
    self._transport = Transport(
  File "/<my_script_location>/venv/lib/python3.8/site-packages/google/cloud/documentai_v1/services/document_processor_service/transports/grpc.py", line 187, in __init__
    self._prep_wrapped_messages(client_info)
  File "/<my_script_location>/venv/lib/python3.8/site-packages/google/cloud/documentai_v1/services/document_processor_service/transports/base.py", line 134, in _prep_wrapped_messages
    self.process_document: gapic_v1.method.wrap_method(
  File "/<my_script_location>/venv/lib/python3.8/site-packages/google/api_core/gapic_v1/method.py", line 241, in wrap_method
    user_agent_metadata = [client_info.to_grpc_metadata()]
AttributeError: 'ClientInfo' object has no attribute 'to_grpc_metadata'```

Making sure to follow these steps will guarantee the quickest resolution possible.

Thanks!

parthea commented 9 months ago

The code below uses ClientInfo from google.api_core.client_info.py

https://github.com/googleapis/python-documentai-toolbox/blob/ecb656cd88d36b401587f4173882e5238d8dbea0/google/cloud/documentai_toolbox/utilities/gcs_utilities.py#L27-L28

However there is also a ClientInfo in google/api_core/gapic_v1/client_info.py which has the to_grpc_metadata method

The latter one is the one used in gapic-generator-python. https://github.com/search?q=repo%3Agoogleapis%2Fgapic-generator-python%20%22clientinfo%22&type=code

Changing the code below to use ClientInfo from google/api_core/gapic_v1/client_info.py instead of google.api_core.client_info.py should resolve the issue ClientInfo' object has no attribute 'to_grpc_metadata' https://github.com/googleapis/python-documentai-toolbox/blob/ecb656cd88d36b401587f4173882e5238d8dbea0/google/cloud/documentai_toolbox/utilities/gcs_utilities.py#L27-L28

holtskinner commented 9 months ago

Thanks @parthea I'm not sure why this would be happening now, this has been working fine for quite a few versions. But I can try changing that import.

parthea commented 9 months ago

@holtskinner , Please could you check if you are able to reproduce the issue using the code provided?

from google.cloud import documentai
from google.cloud.documentai_toolbox import document

wrapped_document = document.Document.from_batch_process_operation(
    operation_name=operation_name
    location=location
)

wrapped_document.entities_to_bigquery(
        dataset_name=dataset, table_name=table, project_id=project
)
holtskinner commented 9 months ago
from google.cloud import documentai
from google.cloud.documentai_toolbox import document

wrapped_document = document.Document.from_batch_process_operation(
    operation_name=operation_name
    location=location
)

wrapped_document.entities_to_bigquery(
        dataset_name=dataset, table_name=table, project_id=project
)

Not sure if this was just a copy-paste error, but there should be a comma after operation_name=operation_name before the location parameter. The code won't run otherwise, but it's this error instead.

    operation_name=operation_name
                   ^^^^^^^^^^^^^^
SyntaxError: invalid syntax. Perhaps you forgot a comma?
holtskinner commented 9 months ago

I was able to reproduce this behavior in Python3.8 in the compiled library only.

I'll also add in Samples (w/tests) for this import method to catch this.

holtskinner commented 9 months ago

I think I see why this wasn't an issue earlier. In #249, I made changes to how from_batch_process_operation() gets the Operation data, which might now require the gapic import for 3.8. The Unit/Integration tests didn't catch this.

gabrielboehme commented 8 months ago

@holtskinner thanks for addressing this so quickly! But now Im getting another error: the method 'from_batch_process_metadata' raises the following error,

ValueError: Invalid Document - shardInfo.shardCount (1) does not match number of shards (1053)

which is strange to me, since I expected to input my operation name (that succeeded FYI), and get all the wrapped documents. But the error is claiming that there are too many shards (?).

The same thing happens if I use the 'from_gcs' method, passing the root directory ( dir let's call it) of that operation output as gcs_prefix. If I use the // as gcs_prefix, the method succeeds.