roboflow / inference

A fast, easy-to-use, production-ready inference server for computer vision supporting deployment of many popular model architectures and fine-tuned models.
https://inference.roboflow.com
Other
1.32k stars 118 forks source link

Hacktoberfest 2024 | Google Vision OCR 🤝 Workflows #692

Closed PawelPeczek-Roboflow closed 1 week ago

PawelPeczek-Roboflow commented 1 week ago

Google Vision OCR in Workflows

Are you ready to make a meaningful contribution this Hacktoberfest? We are looking to integrate Google Vision OCR into our Workflows ecosystem! This new OCR block, will be a valuable addition, addressing a common challenge that many users face.

Join us in expanding our ecosystem and empowering users to effortlessly extract text and structure from their documents. Whether you’re a seasoned contributor or new to open source, your skills and ideas can help make this project a success. Let’s collaborate and bring this essential functionality to life!

🚧 Task description 🏗️

Cheatsheet

Scaffolding for the block

💻 Code snippet ```python from typing import List, Literal, Optional, Type, Union from pydantic import ConfigDict import supervision as sv import requests from inference.core.workflows.execution_engine.entities.base import ( OutputDefinition, WorkflowImageData, ) from inference.core.workflows.execution_engine.entities.types import ( StepOutputImageSelector, WorkflowImageSelector, OBJECT_DETECTION_PREDICTION_KIND, ) from inference.core.workflows.prototypes.block import ( BlockResult, WorkflowBlock, WorkflowBlockManifest, ) class BlockManifest(WorkflowBlockManifest): model_config = ConfigDict( json_schema_extra={ "name": "Google Vision OCR", "version": "v1", "short_description": "TODO", "long_description": "TODO", "license": "Apache-2.0", "block_type": "model", }, protected_namespaces=(), ) type: Literal["roboflow_core/google_vision_ocr@v1"] image: Union[WorkflowImageSelector, StepOutputImageSelector] ocr_type: Literal["text_detection", "ocr_text_detection"] @classmethod def describe_outputs(cls) -> List[OutputDefinition]: return [ OutputDefinition( name="predictions", kind=[OBJECT_DETECTION_PREDICTION_KIND] ), ] @classmethod def get_execution_engine_compatibility(cls) -> Optional[str]: return ">=1.0.0,<2.0.0" class RoboflowObjectDetectionModelBlockV1(WorkflowBlock): @classmethod def get_manifest(cls) -> Type[WorkflowBlockManifest]: return BlockManifest def run( self, image: WorkflowImageData, ocr_type: Literal["text_detection", "ocr_text_detection"] ) -> BlockResult: results = requests.post(...) return { "predictions": sv.Detections(...) } ```
brunopicinin commented 1 week ago

I forked the project and started to develop a new block, but one thing is not clear to me.

Given the following image: https://testsigma.com/blog/wp-content/uploads/What-is-the-OCR-Test-How-to-Create-Automate-It.png

Passing this image to Google API as such:

POST https://vision.googleapis.com/v1/images:annotate?key=[YOUR_API_KEY] HTTP/1.1

Authorization: Bearer [YOUR_ACCESS_TOKEN]
Accept: application/json
Content-Type: application/json

{
  "requests": [
    {
      "image": {
        "source": {
          "imageUri": "https://testsigma.com/blog/wp-content/uploads/What-is-the-OCR-Test-How-to-Create-Automate-It.png"
        }
      },
      "features": [
        {
          "type": "TEXT_DETECTION"
        }
      ]
    }
  ]
}

Results in the following response:

{
  "responses": [
    {
      "textAnnotations": [
        {
          "locale": "en",
          "description": "OCR test\nOCR",
          "boundingPoly": {
            "vertices": [
              {
                "x": 265,
                "y": 261
              },
              {
                "x": 940,
                "y": 261
              },
              {
                "x": 940,
                "y": 324
              },
              {
                "x": 265,
                "y": 324
              }
            ]
          }
        },
        {
          "description": "OCR",
          "boundingPoly": {
            "vertices": [
              {
                "x": 265,
                "y": 281
              },
              {
                "x": 382,
                "y": 282
              },
              {
                "x": 382,
                "y": 321
              },
              {
                "x": 265,
                "y": 320
              }
            ]
          }
        },
        {
          "description": "test",
          "boundingPoly": {
            "vertices": [
              {
                "x": 396,
                "y": 282
              },
              {
                "x": 505,
                "y": 283
              },
              {
                "x": 505,
                "y": 322
              },
              {
                "x": 396,
                "y": 321
              }
            ]
          }
        },
        {
          "description": "OCR",
          "boundingPoly": {
            "vertices": [
              {
                "x": 756,
                "y": 261
              },
              {
                "x": 940,
                "y": 262
              },
              {
                "x": 940,
                "y": 324
              },
              {
                "x": 756,
                "y": 323
              }
            ]
          }
        }
      ],
      "fullTextAnnotation": {
        ...
      }
    }
  ]
}

Should the block output sv.Detections(...) with the full text match only, the word matches only, or both?

PawelPeczek-Roboflow commented 1 week ago

Hi @brunopicinin, At first, thanks for taking the challenge 💪

Regarding the question - good point, I believe that it would be good to have Workflow block output that would simply dump the whole recognised text + output with sv.Detections(...) that would denote each parsed region

brunopicinin commented 1 week ago

Created a PR for this issue: https://github.com/roboflow/inference/pull/709

PawelPeczek-Roboflow commented 1 week ago

Amazing 💪 taking review now

PawelPeczek-Roboflow commented 1 week ago

posted PR review, great thanks for contribution

PawelPeczek-Roboflow commented 1 week ago

Approved PR, merged to main, great thanks for contribution 🏅