Lens-specific notes:
The code you need to run the character detector is in ./lens/
and
./trainDetector/train_detector.m
To train and test the character detector on pill images, you first need to generate the train and test data. Here's how:
./lens/data
. Update the POSITIVE_CHAR_PATCHES_DIR
and NEGATIVE_CHAR_PATCHES_DIR flags wherever they exist in
./lens
.In ./trainDetector/train_detector.m
, set USE_EXISTING_CLUSTERS
to 0 and set USE_LENS_DATA to 1. Make sure the filenames defined
at the top of this file are correct. Then run this script. A
report of train and test accuracy will be printed.
To run the character detector over pill images, run
./lens/detect_chars_in_dir.m
, and then visualize the predictions
with show_detected_char_boxes.py,
This is a demo package of the code we used for our paper, "End-to-End Text Recognition with Convolutional Neural Networks", T. Wang, D. Wu, A. Coates, A. Ng, in ICPR 2012.
Feel free to contact me if you have any questions.
My email address: twangcat@stanford.edu My website: http://cs.stanford.edu/people/twangcat/
The package is provided to facilitate understanding of the algorithms used in our paper. It contains demos to all the major components of our end-to-end system. Because of speed considerations, the CNN architectures as well as dataset sizes provided in this package are much smaller than the actural ones used in the paper, although the algorithms themselves are identical. We also provide our trained models together with code to reproduce major results reported in our paper.
Below are instructions about running the demo for each component, as well as the end-to-end system. All code run under Matlab.
=======================================================================
Download the toy cropped character datasets from: http://cs.stanford.edu/people/twangcat/ICPR2012_code/charDatasets.tar and extract the 'dataset' folder under SceneTextCNN_demo/
cd kmeans/ kmeansDemo;
This will crop a set of random 8x8 patches, and train a set of first layer filters using K-Means. Filters with small variances are removed. The final resulting filters can be visualized with display_network() as shown in the script.
=======================================================================
go to the top level directory of SceneTextCNN_demo, then do:
cd trainDetector train_detector;
This will train a character detector starting from raw data. The pipeline is:
=======================================================================
go to the top level directory of SceneTextCNN_demo, then do:
cd trainClassifier train_classifier;
This will train a character classifier starting from raw data. The pipeline is:
We get the following training and validation results on the toy dataset train accuracy = 1.0000, validation accuracy = 0.7699 You may get slightly different numbers due to different random initial point
=======================================================================
go to the top level directory of SceneTextCNN_demo, then do:
cd detectorDemo runDetectorDemo; This might take quite long, mainly due to the multilayer sliding window process. Eventually the image should be marked with line-level bounding boxes with candidate space locations.
=======================================================================
Download the full ICDAR Test dataset from: http://algoval.essex.ac.uk/icdar/datasets/TrialTest/scene.zip and extract the SceneTrailTest folder under SceneTextCNN_demo/
go to the top level directory of SceneTextCNN_demo, then do:
e2e_visualize; This might take quite long, mainly due to the multilayer sliding window process. Eventually the image should be marked with word bounding boxes, and their labels and positions will be printed in Matlab
=======================================================================
a. For the ICDAR-WD-FULL benchmark: Download the test set of ICDAR 2003 Robust Word Recognition Dataset from: http://algoval.essex.ac.uk/icdar/datasets/TrialTest/word.zip Extract the 'word' folder under the top level directory of SceneTextCNN_demo, then do:
wordRecog;
b. For the ICDAR-WD-50 benchmark: We have included the lexicons used by Kai Wang in his SVT dataset paper under: icdarTestLex/ Simply change the 'lextype' variable in wordRecog.m to '50', then do:
wordRecog;
c. For the SVT word recognition benchmark: Download the SVT Dataset from: http://vision.ucsd.edu/~kai/svt/svt.zip and extract the folder svt1 under the top level directory of SceneTextCNN_demo. Change the 'benchMark' variable in wordRecog.m to 'svt', then do:
wordRecog;
=======================================================================
Download and extract the ICDAR and SVT dataset as described in steps 5 and 6
Download the precomputed line-level bounding boxes from: http://cs.stanford.edu/people/twangcat/ICPR2012_code/lineBboxes.tar and extract the precomputedLineBboxes folder under SceneTextCNN_demo/ It contains the line-level bounding box information extracted using our best detector model. Computing them from scratch takes a long time, so we have provided them for you. You can also use the detectorDemo/runDetectorFull.m script to regenerate them on your own. Just remember to create an empty precomputedLineBboxes/ directory.
run command:
e2e_demo; It generates end-to-end ICDAR-FULL and SVT results reported in our paper. To use different lexicon sizes for ICDAR, change the 'lextype' variable in e2e_demo.m accordingly.