Closed Veeedsss closed 7 months ago
Ah, I think this could be an issue with the Notebook rather than the Toolbox Library.
The field mask is set to not include the shardInfo
in the response, which is required for the Toolbox to function with multi-shard Documents.
The line:
field_mask = "text,entities,pages.pageNumber" # Optional. The fields to return in the Document object.
Should be changed to:
field_mask = "text,entities,pages,shardInfo"
Or just removed entirely, since the fieldMask
is optional.
I'll make an update to the notebook
I tried using ToolBox Client Python Library but I am facing this unknown error after execution.
Error: ValueError: Invalid Document - shardInfo.shardCount (0) does not match number of shards (11).
FYI: There are 50 documents that will processed. I am using the same code suggested by google to which I have provided the link for your reference.
Link: https://github.com/GoogleCloudPlatform/document-ai-samples/blob/main/toolbox-batch-processing/documentai-toolbox-batch-entity-extraction.ipynb