Roadmap of MMOCR - Githubissues

open-mmlab / mmocr

OpenMMLab Text Detection, Recognition and Understanding Toolbox

https://mmocr.readthedocs.io/en/dev-1.x/

Apache License 2.0

4.37k stars 754 forks source link

Roadmap of MMOCR #39

Open jeffreykuang opened 3 years ago

jeffreykuang commented 3 years ago

We keep this issue open to collect feature requests from users and hear your voice. Our monthly release plan is also available here.

You can either:

Suggest a new feature by leaving a comment.
Vote for a feature request with 👍 or be against with 👎. (Remember that developers are busy and cannot respond to all feature requests, so vote for your most favorable one!)
Tell us that you would like to help implement one of the features in the list or review the PRs. (This is the greatest things to hear about!)

INF800 commented 3 years ago

I think it will be a good idea to have colab demo / tutorial for all available features so that developers can get familiar with the package

innerlee commented 3 years ago

@rakesh4real colab demo is planned in the next iteration.

huyhoang17 commented 3 years ago

@rakesh4real Hi, thanks for your great library How do you think about integrating some features like end-to-end spotting, in which the detection and recognition process are merged in a single network to learn both tasks?. Some related papers:

FOTS: https://arxiv.org/abs/1801.01671
MaskTextSpotter: https://arxiv.org/abs/1807.02242
E2E text spotting: https://arxiv.org/abs/1908.09231

jeffreykuang commented 3 years ago

@huyhoang17 end-to-end spotting is one important direction of OCR. Our framework is easy to support end2end methods. We would like to reimplementing them in the future. If you are interested in doing it, welcome to send pr to this repo.

seekingdeep commented 3 years ago

Production Deployment: ability to easily deploy on arm-based devices such as Raspberry Pi, and cpu-only devices. Benefits: ordinary people can detect and recognize text documents without coding knowledge. Requirements: optimize the models for inferencing-only environments, tensorRT, onnx, quantization, etc..
Training Documentation: introduce detailed documentation on how to label the images, train and deploy models. Requirements: simple youtube videos and github documentations.

zcuncun commented 3 years ago

There is no test speed / memory usage in results. Some algorithms with huge model or complicated post process are very slow . This is important while deploying algorithms.

SWHL commented 3 years ago

Hope to have a online demo. So we can quickly test the images to look the ocr result.

kbrajwani commented 3 years ago

One more end to end text spotting model. pgnet :- https://arxiv.org/pdf/2104.05458v1.pdf

cpwan commented 2 years ago

Hi there, I suggest adding pre-trained models for document visual question answering (vqa). Motivation Document VQA is an important task in OCR. It recognizes texts region and finds their relationship. They are useful for processing visually rich documents, such as tables, forms, receipts, invoices. There are families of document vqa algorithms. However, they are maintained in different frameworks. It makes the comparison of downstream tasks' performance difficult.	Model	paper
LayoutXLM	https://arxiv.org/abs/2104.08836	pytorch
StructuralLM	https://arxiv.org/abs/2105.11210	Tensorflow
StrucTexT	https://arxiv.org/abs/2108.02923	paddlepaddle

Features

Inference with pre-trained transformers
Training pipeline for downstream tasks, such as entity labeling, entity linking, document classification
Document datasets, such as DocVQA, FUNSD

gaotongxiao commented 2 years ago

@cpwan Hi, thanks for your suggestion - that sounds really interesting! We'll definitely take this into our plan.