open-mmlab / mmocr

OpenMMLab Text Detection, Recognition and Understanding Toolbox
https://mmocr.readthedocs.io/en/dev-1.x/
Apache License 2.0
4.37k stars 754 forks source link

Roadmap of MMOCR #39

Open jeffreykuang opened 3 years ago

jeffreykuang commented 3 years ago

We keep this issue open to collect feature requests from users and hear your voice. Our monthly release plan is also available here.

You can either:

  1. Suggest a new feature by leaving a comment.

  2. Vote for a feature request with 👍 or be against with 👎. (Remember that developers are busy and cannot respond to all feature requests, so vote for your most favorable one!)

  3. Tell us that you would like to help implement one of the features in the list or review the PRs. (This is the greatest things to hear about!)

INF800 commented 3 years ago

I think it will be a good idea to have colab demo / tutorial for all available features so that developers can get familiar with the package

innerlee commented 3 years ago

@rakesh4real colab demo is planned in the next iteration.

huyhoang17 commented 3 years ago

@rakesh4real Hi, thanks for your great library How do you think about integrating some features like end-to-end spotting, in which the detection and recognition process are merged in a single network to learn both tasks?. Some related papers:

jeffreykuang commented 3 years ago

@huyhoang17 end-to-end spotting is one important direction of OCR. Our framework is easy to support end2end methods. We would like to reimplementing them in the future. If you are interested in doing it, welcome to send pr to this repo.

seekingdeep commented 3 years ago
zcuncun commented 3 years ago

There is no test speed / memory usage in results. Some algorithms with huge model or complicated post process are very slow . This is important while deploying algorithms.

SWHL commented 3 years ago

Hope to have a online demo. So we can quickly test the images to look the ocr result.

kbrajwani commented 3 years ago

One more end to end text spotting model. pgnet :- https://arxiv.org/pdf/2104.05458v1.pdf

cpwan commented 2 years ago
Hi there, I suggest adding pre-trained models for document visual question answering (vqa). Motivation Document VQA is an important task in OCR. It recognizes texts region and finds their relationship. They are useful for processing visually rich documents, such as tables, forms, receipts, invoices. There are families of document vqa algorithms. However, they are maintained in different frameworks. It makes the comparison of downstream tasks' performance difficult. Model paper source
LayoutXLM https://arxiv.org/abs/2104.08836 pytorch
StructuralLM https://arxiv.org/abs/2105.11210 Tensorflow
StrucTexT https://arxiv.org/abs/2108.02923 paddlepaddle

Features

gaotongxiao commented 2 years ago

@cpwan Hi, thanks for your suggestion - that sounds really interesting! We'll definitely take this into our plan.