googleapis / python-vision

This library has moved to https://github.com/googleapis/google-cloud-python/tree/main/packages/google-cloud-vision
Apache License 2.0
161 stars 85 forks source link

TextDetectionParams does not contain disable_orientation_detection parameter #138

Closed mananshah1403 closed 3 years ago

mananshah1403 commented 3 years ago

`def async_detect_document(gcs_source_uri, gcs_destination_uri): """OCR with PDF/TIFF as source files on GCS"""

# Supported mime_types are: 'application/pdf' and 'image/tiff'
mime_type = "application/pdf"

# How many pages should be grouped into each json output file.
batch_size = 10

client = vision.ImageAnnotatorClient()

text_feature = vision.Feature(type_=vision.Feature.Type.DOCUMENT_TEXT_DETECTION)

image_context = vision.ImageContext(
    text_detection_params=vision.TextDetectionParams(disable_orientation_detection=True)
)
features = [text_feature]

gcs_source = vision.GcsSource(uri=gcs_source_uri)
input_config = vision.InputConfig(gcs_source=gcs_source, mime_type=mime_type)

gcs_destination = vision.GcsDestination(uri=gcs_destination_uri)
output_config = vision.OutputConfig(gcs_destination=gcs_destination, batch_size=batch_size)

async_request = vision.AsyncAnnotateFileRequest(
    features=features,
    image_context=image_context,
    input_config=input_config,
    output_config=output_config,
)`

image_context = vision.ImageContext( text_detection_params=vision.TextDetectionParams(disable_orientation_detection=True) )

I get the following error, because of the line above. `Traceback (most recent call last): File ".\spike\google_ocr.py", line 84, in "gs://manan-ocr-testing/tt", File ".\spike\google_ocr.py", line 24, in async_detect_document text_detection_params=vision.TextDetectionParams(mapping={"lineFilter": {"paths": ["confidence", "mergedText"]}}) File "C:\Users\manan.shah.virtualenvs\document_analysis-Q6JswU4T\lib\site-packages\proto\message.py", line 503, in init "Unknown field for {}: {}".format(self.class.name, key) ValueError: Unknown field for TextDetectionParams: lineFilter PS C:\git\ds\document_analysis> pipenv run python .\spike\google_ocr.py Loading .env environment variables… Traceback (most recent call last): File "C:\Users\manan.shah.virtualenvs\document_analysis-Q6JswU4T\lib\site-packages\proto\message.py", line 497, in init pb_type = self._meta.fields[key].pb_type KeyError: 'disable_orientation_detection'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File ".\spike\google_ocr.py", line 84, in "gs://manan-ocr-testing/tt", File ".\spike\google_ocr.py", line 24, in async_detect_document text_detection_params=vision.TextDetectionParams(disable_orientation_detection=True) File "C:\Users\manan.shah.virtualenvs\document_analysis-Q6JswU4T\lib\site-packages\proto\message.py", line 503, in init "Unknown field for {}: {}".format(self.class.name, key) ValueError: Unknown field for TextDetectionParams: disable_orientation_detection`

class TextDetectionParams(proto.Message):

    Attributes:
        enable_text_detection_confidence_score (bool):
            By default, Cloud Vision API only includes confidence score
            for DOCUMENT_TEXT_DETECTION result. Set the flag to true to
            include confidence score for TEXT_DETECTION as well.

    enable_text_detection_confidence_score = proto.Field(proto.BOOL, number=9)

When I look at the documentation, there is not a disable_orientation_detection property within TextDetectionParams, although the google cloud vision API mentions it here.

https://cloud.google.com/vision/docs/reference/rest/v1p4beta1/ImageContext#TextDetectionParams

Any ideas what am I missing here?

Environment details

Thanks!

munkhuushmgl commented 3 years ago

@mananshah1403 Could you make sure your import is as the following?

    from google.cloud import vision_v1p4beta1 as vision
mananshah1403 commented 3 years ago

@munkhuushmgl Yes confirmed the import is as you mentioned above. Here is the script.

import json
import re
from timeit import default_timer as timer

from google.cloud import vision_v1p4beta1 as vision
from google.cloud import storage

# [END vision_document_text_tutorial_imports]

def async_detect_document(gcs_source_uri, gcs_destination_uri):
    """OCR with PDF/TIFF as source files on GCS"""

    # Supported mime_types are: 'application/pdf' and 'image/tiff'
    mime_type = "application/pdf"

    # How many pages should be grouped into each json output file.
    batch_size = 10

    client = vision.ImageAnnotatorClient()

    text_feature = vision.Feature(type_=vision.Feature.Type.DOCUMENT_TEXT_DETECTION)

    image_context = vision.ImageContext(
        text_detection_params=vision.TextDetectionParams(disable_orientation_detection=True)
    )
    features = [text_feature]

    gcs_source = vision.GcsSource(uri=gcs_source_uri)
    input_config = vision.InputConfig(gcs_source=gcs_source, mime_type=mime_type)

    gcs_destination = vision.GcsDestination(uri=gcs_destination_uri)
    output_config = vision.OutputConfig(gcs_destination=gcs_destination, batch_size=batch_size)

    async_request = vision.AsyncAnnotateFileRequest(
        features=features,
        image_context=image_context,
        input_config=input_config,
        output_config=output_config,
    )

    start = timer()
    operation = client.async_batch_annotate_files(requests=[async_request])

    print("Waiting for the operation to finish.")
    operation.result(timeout=420)

    end = timer()
    print(end - start)
    # Once the request has completed and the output has been
    # written to GCS, we can list all the output files.
    storage_client = storage.Client()

    match = re.match(r"gs://([^/]+)/(.+)", gcs_destination_uri)
    bucket_name = match.group(1)
    prefix = match.group(2)

    bucket = storage_client.get_bucket(bucket_name)

    # List objects with the given prefix.
    blob_list = list(bucket.list_blobs(prefix=prefix))
    # Process the first output file from GCS.
    # Since we specified batch_size=2, the first response contains
    # the first two pages of the input file.
    output = blob_list[0]

    json_string = output.download_as_string()
    response = json.loads(json_string)

    # The actual response for the first page of the input file.
    first_page_response = response["responses"][0]
    annotation = first_page_response["fullTextAnnotation"]

    # Here we print the full text from the first page.
    # The response contains more information:
    # annotation/pages/blocks/paragraphs/words/symbols
    # including confidence scores and bounding boxes
    print("Full text:\n")
    print(annotation["text"])

if __name__ == "__main__":
    async_detect_document(
        "gs://manan-ocr-testing/Protection-Fault-Error.pdf",
        "gs://manan-ocr-testing/test",
    )

even the vision_v1pbeta1 version of the sdk has the following class definition for TextDetectionParams image

mananshah1403 commented 3 years ago

@munkhuushmgl @andrewferlitsch Any updates on this?

munkhuushmgl commented 3 years ago

@mananshah1403 I am looking into it right now. Sorry about the delay.

munkhuushmgl commented 3 years ago

@busunkim96 Hey Bu Sun, Can you help me on this?

busunkim96 commented 3 years ago

Hi,

The client libraries in all languages are generated from source proto files. I checked TextDetectionParams in v1p4beta1 and it doesn't currently have disable_orientation_detection. https://github.com/googleapis/googleapis/blob/0d68bbb80a7620b69aff5ab0b497c8a396e73558/google/cloud/vision/v1p4beta1/image_annotator.proto#L625-L631

The Vision team needs to update their protos so the libraries can use them. (last update of v1p4beta1 was in November 2020) https://github.com/googleapis/googleapis/commits/80a56e032bcc6a52cc41091c9a9ab527ec233f1f/google/cloud/vision/v1p4beta1/image_annotator.proto. @munkhuushmgl please reach out to a contact on the Vision team to ask for a proto update. go/client-user-guide has the required steps.

munkhuushmgl commented 3 years ago

Thanks @busunkim96 I will reach out Vision Team.

vinnysenthil commented 3 years ago

Hi @mananshah1403, unfortunately a recent update of our reference documentation for the Vision API included fields that are unavailable on the service. We've since updated the documentation to be accurate. Apologies for any trouble caused by this.

lorenzob commented 1 year ago

Hi, can you please confirm that for the time being there is no way to disable text orientation detection? I have text with mixed orientations and I am not able to read all of them because it looks like the image always gets automatically rotated.