open-mmlab / mmocr

OpenMMLab Text Detection, Recognition and Understanding Toolbox
https://mmocr.readthedocs.io/en/dev-1.x/
Apache License 2.0
4.35k stars 750 forks source link

Custom dataset creation tutorial #555

Closed pushpalatha1405 closed 2 years ago

pushpalatha1405 commented 3 years ago

Hi Thong, I have created a file named make_dataset.md(enabled PR also). I need add contents. Do have specific topics i should cover and organize in the file. I will add contents to the file by the end of this week.

regards, Pushpalatha M

gaotongxiao commented 3 years ago

Hi Pushpalatha, thank you for your interest! For now, it could be a walk-through (preferably, with an example) that teaches beginners how to create a wildreceipt-like annotation file with Labelme and the conversion script, which basically summarizes your discussions in #434. The full tutorial will certainly cover text detection & recognition datasets as well, but we will take care of those parts so you don't need to worry about them.

BTW @amitbcp would you push your conversion script to tools/data/kie/? In this way, your contribution can be recorded, and we can refer to your script in the tutorial.

pushpalatha1405 commented 3 years ago

ok thong i will create the tutorial and update you.

payal211 commented 2 years ago

Hi @gaotongxiao @pushpalatha1405

I am looking for annotating custom dataset for Text Detection Problem. Can you please Help me with this? i.e I have Yolo Format txts or PascalVoc Format xmls annotation for text containing Images, Then How should I Convert it into text detection and text recognition format.

gaotongxiao commented 2 years ago

Hi @payal211, we use COCO-Text Format for all our text detection tasks. You may write a script to convert your files into that format.