Feature Request: Text Position Detection in TR-OCR
Model I am using TrOCR
Description:
I have been using TR-OCR for text recognition in images and it has been performing well. However, for some use-cases, it's crucial to not only recognize the text but also determine their positions within the images. This feature would be extremely useful in document digitalization and analysis where the position of text could hold significant meaning.
Proposed Solution:
Extend the TR-OCR API to include an additional method/parameter that enables text position detection.
The method/parameter could return the bounding box coordinates (X, Y, Width, Height) of each detected text elements (on character, word, sentence level).
Use Case:
This feature would be helpful in various scenarios such as:
Document digitalization where the position of text is crucial for understanding the document structure.
Image analysis where text position could provide additional context.
Additional Information:
I'm willing to contribute into this
I tried searching for this a lot, but maybe I'm missing something. If so, please let me know how to get it done.
Feature Request: Text Position Detection in TR-OCR Model I am using
TrOCR
Description:
I have been using TR-OCR for text recognition in images and it has been performing well. However, for some use-cases, it's crucial to not only recognize the text but also determine their positions within the images. This feature would be extremely useful in document digitalization and analysis where the position of text could hold significant meaning.
Proposed Solution:
Use Case:
This feature would be helpful in various scenarios such as:
Additional Information:
I tried searching for this a lot, but maybe I'm missing something. If so, please let me know how to get it done.