open-mmlab / mmocr

OpenMMLab Text Detection, Recognition and Understanding Toolbox
https://mmocr.readthedocs.io/en/dev-1.x/
Apache License 2.0
4.34k stars 749 forks source link

Performance on TextOCR Dataset #259

Open jkcg-learning opened 3 years ago

jkcg-learning commented 3 years ago

Motivation

Improve the benchmark performance of all algorithms based on TextOCR dataset released by Facebook AI research team

Related resources https://textvqa.org/textocr

Overview TextOCR requires models to perform text-recognition on arbitrary shaped scene-text present on natural images. TextOCR provides ~1M high quality word annotations on TextVQA images allowing application of end-to-end reasoning on downstream tasks such as visual question answering or image captioning.

Statistics 28,134 natural images from TextVQA 903,069 annotated scene-text words 32 words per image on average

cuhk-hbsun commented 3 years ago

Thanks for your suggestion. And we will take it into our July plan.

jkcg-learning commented 3 years ago

Team, is this in consideration for the next release ?

gaotongxiao commented 3 years ago

We already support TextOCR dataset now (https://mmocr.readthedocs.io/en/latest/datasets.html)

jkcg-learning commented 3 years ago

Thanks for adding this dataset for the purpose of training...

Shall we also expect a model checkpoint particularly trained based on this dateset from the team..

gaotongxiao commented 3 years ago

Currently we only have DBNet pretrained on TextOCR. Do you have any requests for the model type and the specific datasets that it is pretrained on? We may add that to our plan if we believe that it also benefits our community.

jkcg-learning commented 3 years ago

https://mmocr.readthedocs.io/en/latest/textdet_models.html#icdar2015

image

Is it possible to update the DBNet model zoo with the details of your model training and the metric levels for TextOCR dataset ..