kris314 / hwnet

Representation for Handwritten Word Images
24 stars 3 forks source link

hwnet

Representation for Handwritten Word Images

Dataset

To download synthetic data and its annotation file, navigate to hwnet/iiit-hws/ and refer the README.md

Installation

The code is built using pytorch library. Following are the necessary packages to be installed:

Computing image features for a new corpus of word images.

Pre-requisite data

cd pytorch
python hwnet-feat.py --annFile ../ann/test_ann.txt --pretrained_file pretrained/iam-model.t7 --img_folder ../wordImages/ --testAug --exp_dir output/ --exp_id iam-test-0

The above code will compute features and save it numpy matrices in location output/models/iam-test-0/. Here feats.npy will contain featues for word images in the order provided in annotation file. The dimension of the matrix would be Nx2048. Here 'N' is the number of word images and 2048 is the feature dimension for the current trained model.

Arguments for running above code:

There are other arguments in the code. Please keep the default setting for current purpose.

Evaluation of Query-By-Image Word Spotting

Pre-requisite data

cd pytorch
python eval.py --exp_dir output/ --exp_id iam-test-0 --annFile ../ann/test_ann.txt --query_file ../ann/test_query.txt

The above code will compute average precision scores for each query and finally dump the mean average precision (mAP) for the entire dataset.

Arguments for running above code:

There are other arguments in the code. Please keep the default setting for current purpose.

Sample wordImages, annotation file and the query file is kept in its respective folder location.

Citation

If you are using the dataset, please cite the below arxiv paper:-

If you are comparing our method for word spotting, please cite the below relevant papers:-

Contact

Incase of any doubts, please contact the author using below details:-
Author Name: Praveen Krishnan
Author Email: praveen.krishnan@research.iiit.ac.in