oyxhust / ssd-text_detection

A modified SSD model for text detection
91 stars 41 forks source link

SSD-text detection: Text Detector

This is a modified SSD model for text detection.

Compared to faster R-CNN, SSD is much faster. In my expriment, SSD only needs about 0.05s for each image.

Disclaimer

This is a re-implementation of mxnet SSD. The official repository is available here. The arXiv paper is available here.

Getting started

Train the model

I modify the original SSD on SynthText and ICDAR. Other datasets should be easily supported by adding subclass derived from class Imdb in dataset/imdb.py. See example of dataset/pascal_voc.py for details.

To gain a good performance, we should train our model on SynthText which is a quite big dataset (about 40G) firstly, and then fine tune this model on ICDAR. If you want to apply this model for other applications, you can fine tune it on any dataset.

Because SSD requires every image's size but SythText is too big, it will take too much time if we have to use opencv to read the images' size each time when we star training. So I use 'read_size.py' (data/synthtext_img_size) to creat a h5py file 'size.h5' to store the sizes of all images. You can copy this file to the extracted folder 'SynthText'.

Try the demo

When running demo_savefig.py, please give the test images path.

When running demo_savefig.py, please give the test images folder path.