lars76 / chinese-subtitle-ocr

Optical character recognition for Chinese subtitles using SSD and CNN
MIT License
110 stars 30 forks source link

关于images.csv的问题 #4

Open qimingfeijin opened 5 years ago

qimingfeijin commented 5 years ago

运行download_images.py报错,错误提示为No such file or directory: 'images.csv',请问我该怎么解决

lars76 commented 5 years ago

Hi,

instead of download_images.py, just use the COCO dataset. It is much smaller and for OCR you actually don't need so many images. You can directly download 5K images here: http://images.cocodataset.org/zips/val2017.zip. Then you don't need download_images.py

Hope this helps.

qimingfeijin commented 5 years ago

感谢你的帮助与分享。我想做中文的文本检测,需要一些中文的图片训练和测试,请问你的中文数据集是在哪里下载的?

lars76 commented 5 years ago

I generated the dataset myself by using a subtitle file (srt) and then doing manual annotation. I don't think that there are any datasets that you can download.

Most papers actually generate their own training/test images by creating random text on images. Look at this github project https://github.com/JarveeLee/SynthText_Chinese_version and the corresponding paper is described here https://blog.csdn.net/u010167269/article/details/52389676. I tried something similar myself and it produced equal or better results than a real dataset.

qimingfeijin commented 5 years ago

我明白了,谢谢你的分享

wushilian commented 5 years ago

@lars76 can you share your method for synthesise dataset?