microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
20.05k stars 2.55k forks source link

Feature Request: Text position detection in TR-OCR #1341

Open maifeeulasad opened 1 year ago

maifeeulasad commented 1 year ago

Feature Request: Text Position Detection in TR-OCR Model I am using TrOCR

Description:

I have been using TR-OCR for text recognition in images and it has been performing well. However, for some use-cases, it's crucial to not only recognize the text but also determine their positions within the images. This feature would be extremely useful in document digitalization and analysis where the position of text could hold significant meaning.

Proposed Solution:

Use Case:

This feature would be helpful in various scenarios such as:

Additional Information:

I tried searching for this a lot, but maybe I'm missing something. If so, please let me know how to get it done.

### Tasks
artunit commented 7 months ago

I am running into this as well, it would be very useful to have some level of coordinate information.