Closed PawelPeczek-Roboflow closed 1 week ago
I forked the project and started to develop a new block, but one thing is not clear to me.
Given the following image: https://testsigma.com/blog/wp-content/uploads/What-is-the-OCR-Test-How-to-Create-Automate-It.png
Passing this image to Google API as such:
POST https://vision.googleapis.com/v1/images:annotate?key=[YOUR_API_KEY] HTTP/1.1
Authorization: Bearer [YOUR_ACCESS_TOKEN]
Accept: application/json
Content-Type: application/json
{
"requests": [
{
"image": {
"source": {
"imageUri": "https://testsigma.com/blog/wp-content/uploads/What-is-the-OCR-Test-How-to-Create-Automate-It.png"
}
},
"features": [
{
"type": "TEXT_DETECTION"
}
]
}
]
}
Results in the following response:
{
"responses": [
{
"textAnnotations": [
{
"locale": "en",
"description": "OCR test\nOCR",
"boundingPoly": {
"vertices": [
{
"x": 265,
"y": 261
},
{
"x": 940,
"y": 261
},
{
"x": 940,
"y": 324
},
{
"x": 265,
"y": 324
}
]
}
},
{
"description": "OCR",
"boundingPoly": {
"vertices": [
{
"x": 265,
"y": 281
},
{
"x": 382,
"y": 282
},
{
"x": 382,
"y": 321
},
{
"x": 265,
"y": 320
}
]
}
},
{
"description": "test",
"boundingPoly": {
"vertices": [
{
"x": 396,
"y": 282
},
{
"x": 505,
"y": 283
},
{
"x": 505,
"y": 322
},
{
"x": 396,
"y": 321
}
]
}
},
{
"description": "OCR",
"boundingPoly": {
"vertices": [
{
"x": 756,
"y": 261
},
{
"x": 940,
"y": 262
},
{
"x": 940,
"y": 324
},
{
"x": 756,
"y": 323
}
]
}
}
],
"fullTextAnnotation": {
...
}
}
]
}
Should the block output sv.Detections(...)
with the full text match only, the word matches only, or both?
Hi @brunopicinin, At first, thanks for taking the challenge 💪
Regarding the question - good point, I believe that it would be good to have Workflow block output that would simply dump the whole recognised text + output with sv.Detections(...)
that would denote each parsed region
Created a PR for this issue: https://github.com/roboflow/inference/pull/709
Amazing 💪 taking review now
posted PR review, great thanks for contribution
Approved PR, merged to main
, great thanks for contribution 🏅
Google Vision OCR in Workflows
Are you ready to make a meaningful contribution this Hacktoberfest? We are looking to integrate Google Vision OCR into our Workflows ecosystem! This new OCR block, will be a valuable addition, addressing a common challenge that many users face.
Join us in expanding our ecosystem and empowering users to effortlessly extract text and structure from their documents. Whether you’re a seasoned contributor or new to open source, your skills and ideas can help make this project a success. Let’s collaborate and bring this essential functionality to life!
🚧 Task description 🏗️
requests
library - 📖 REST API docs - in particular this may be useful - we do only want to enableTEXT_DETECTION
andDOCUMENT_TEXT_DETECTION
sv.Detections(...)
object - recognised text should be label, additional metadata about structure (like category of region) should be added intodata
field ofsv.Detections(...)
Cheatsheet
sv.Detections(...)
for object-detection predictions as a referenceScaffolding for the block
💻 Code snippet
```python from typing import List, Literal, Optional, Type, Union from pydantic import ConfigDict import supervision as sv import requests from inference.core.workflows.execution_engine.entities.base import ( OutputDefinition, WorkflowImageData, ) from inference.core.workflows.execution_engine.entities.types import ( StepOutputImageSelector, WorkflowImageSelector, OBJECT_DETECTION_PREDICTION_KIND, ) from inference.core.workflows.prototypes.block import ( BlockResult, WorkflowBlock, WorkflowBlockManifest, ) class BlockManifest(WorkflowBlockManifest): model_config = ConfigDict( json_schema_extra={ "name": "Google Vision OCR", "version": "v1", "short_description": "TODO", "long_description": "TODO", "license": "Apache-2.0", "block_type": "model", }, protected_namespaces=(), ) type: Literal["roboflow_core/google_vision_ocr@v1"] image: Union[WorkflowImageSelector, StepOutputImageSelector] ocr_type: Literal["text_detection", "ocr_text_detection"] @classmethod def describe_outputs(cls) -> List[OutputDefinition]: return [ OutputDefinition( name="predictions", kind=[OBJECT_DETECTION_PREDICTION_KIND] ), ] @classmethod def get_execution_engine_compatibility(cls) -> Optional[str]: return ">=1.0.0,<2.0.0" class RoboflowObjectDetectionModelBlockV1(WorkflowBlock): @classmethod def get_manifest(cls) -> Type[WorkflowBlockManifest]: return BlockManifest def run( self, image: WorkflowImageData, ocr_type: Literal["text_detection", "ocr_text_detection"] ) -> BlockResult: results = requests.post(...) return { "predictions": sv.Detections(...) } ```