Open qimingfeijin opened 5 years ago
Hi,
instead of download_images.py, just use the COCO dataset. It is much smaller and for OCR you actually don't need so many images. You can directly download 5K images here: http://images.cocodataset.org/zips/val2017.zip. Then you don't need download_images.py
Hope this helps.
感谢你的帮助与分享。我想做中文的文本检测,需要一些中文的图片训练和测试,请问你的中文数据集是在哪里下载的?
I generated the dataset myself by using a subtitle file (srt) and then doing manual annotation. I don't think that there are any datasets that you can download.
Most papers actually generate their own training/test images by creating random text on images. Look at this github project https://github.com/JarveeLee/SynthText_Chinese_version and the corresponding paper is described here https://blog.csdn.net/u010167269/article/details/52389676. I tried something similar myself and it produced equal or better results than a real dataset.
我明白了,谢谢你的分享
@lars76 can you share your method for synthesise dataset?
运行download_images.py报错,错误提示为No such file or directory: 'images.csv',请问我该怎么解决