NEW: Code for training the models (using Pytorch) is now available in a new repository: https://github.com/luizgh/sigver
This repository contains the code and instructions to use the trained CNN models described in [1] to extract features for Offline Handwritten Signatures. It also includes the models described in [2] that can generate a fixed-sized feature vector for signatures of different sizes.
[1] Hafemann, Luiz G., Robert Sabourin, and Luiz S. Oliveira. "Learning Features for Offline Handwritten Signature Verification using Deep Convolutional Neural Networks" http://dx.doi.org/10.1016/j.patcog.2017.05.012 (preprint)
[2] Hafemann, Luiz G., Robert Sabourin, and Luiz S. Oliveira. "Fixed-sized representation learning from Offline Handwritten Signatures of different sizes" https://doi.org/10.1007/s10032-018-0301-6 (preprint)
Topics:
The code is written in Python 21. We recommend using the Anaconda python distribution (link), and create a new environment using:
conda create -n sigver -y python=2
source activate sigver
The following libraries are required
They can be installed by running the following commands:
conda install -y "scipy=0.18.0" "pillow=3.0.0"
conda install -y jupyter notebook matplotlib # Optional, to run the example in jupyter notebook
pip install opencv-python
pip install "Theano==0.9"
pip install https://github.com/Lasagne/Lasagne/archive/master.zip
We tested the code in Ubuntu 16.04. This code can be used with or without GPUs - to use a GPU with Theano, follow the instructions in this link. Note that Theano takes time to compile the model, so it is much faster to instantiate the model once and run forward propagation for many images (instead of calling many times a script that instantiates the model and run forward propagation for a single image).
1 Python 3.5 can be also be used, but the feature vectors will differ from those generated from Python 2 (due to small differences in preprocessing the images). Either version can be used, but feature vectors generated from different versions should not be mixed. Note that the data on section Datasets has been obtained using Python 2.
2 Although we used Theano and Lasagne for training, you can also use TensorFlow to extract the features. See tf_example.py for details.
Signet: https://drive.google.com/file/d/1KffsnZu8-33wXklsodofw-a-KX6tAsVN/view?usp=share_link
Signet SPP-models: https://drive.google.com/file/d/1KffsnZu8-33wXklsodofw-a-KX6tAsVN/view?usp=share_link
Run python example.py
and python example_spp.py
. These scripts pre-process a signature, and compare the feature vectors obtained by the model to the results obtained by the author. If the test fails, please check the versions of Scipy and Pillow. I noticed that different versions of these libraries produce slightly different results for the pre-processing steps.
The following code (from example.py) shows how to load, pre-process a signature, and extract features using one of the learned models:
from scipy.misc import imread
from preprocess.normalize import preprocess_signature
import signet
from cnn_model import CNNModel
# Maximum signature size (required for the SigNet models):
canvas_size = (952, 1360)
# Load and pre-process the signature
original = imread('data/some_signature.png', flatten=1)
processed = preprocess_signature(original, canvas_size)
# Load the model
model_weight_path = 'models/signet.pkl'
model = CNNModel(signet, model_weight_path)
# Use the CNN to extract features
feature_vector = model.get_feature_vector(processed)
# Multiple images can be processed in a single forward pass using:
# feature_vectors = model.get_feature_vector_multiple(images)
Note that for the SigNet models (from [1]) the signatures used in the get_feature_vector
method must always have the same size as those used for training the system (150 x 220 pixels).
For the SigNet-SPP methods (from [2]) the signatures can have any size. We provide models trained on signatures scanned at 300dpi and signatures scanned at 600dpi. Refer to the paper for more details on this method.
For an interactive example, use jupyter notebook:
jupyter notebook
Look for the notebook "interactive_example.ipynb". You can also visualize it directly here
While the code requires python (with the libraries mentioned above) to extract features, it is possible to save the results in a matlab format. We included a script that process all signatures in a folder and save the results in matlab files (one .mat file for each signature).
Usage:
python process_folder.py <signatures_path> <save_path> <model_path> [canvas_size]
Example:
python process_folder.py signatures/ features/ models/signet.pkl
This will process all signatures in the "signatures" folder, using the SigNet model, and save one .mat file in the folder "features" for each signatures. Each file contains a single variable named "feature_vector" with the features extracted from the signature.
To facilitate further research, we are also making available the features extracted for each of the four datasets used in this work (GPDS, MCYT, CEDAR, Brazilian PUC-PR), using the models SigNet, SigNet-F (with lambda=0.95) and SigNet-SPP-300dpi.
Dataset | SigNet | SigNet-F | SigNet-SPP-300dpi |
---|---|---|---|
GPDS | GPDS_signet | GPDS_signet_f | GPDS_signetspp_300dpi |
MCYT | MCYT_signet | MCYT_signet_f | MCYT_signetspp_300dpi** |
CEDAR | CEDAR_signet | CEDAR_signet_f | CEDAR_signetspp_300dpi** |
Brazilian PUC-PR* | brazilian_signet | brazilian_signet_f | Brazilian_signetspp_300dpi** |
There are two files for each user: real_X.mat and forg_X.mat. The first contains a matrix of size N x 2048, containing the feature vectors of N genuine signatures from that user. The second contains a matrix of size M x 2048, containing the feature vectors of each of the M skilled forgeries made targetting the user.
* Note: for the brazilian PUC-PR dataset, the first 10 forgeries are "Simple forgeries", while the last 10 forgeries are "Skilled forgeries".
** Note: These results are without finetuning the network to the particular datasets. We used the model trained with "SPP Fixed", and considered images in 300dpi, centered in a canvas of size defined in GPDS (428 X 612; larger images were processed in the original size). Note that this is different than the protocol used in the paper, since in the paper we were randomly splitting the datasets in 50% train(for finetuning) and 50% test.
f = load('real_2.mat')
% f.features: [Nx2048 single]
from scipy.io import loadmat
features = loadmat('real_2.mat')['features']
# features: numpy array of shape (M, 2048)
If you use our code, please consider citing the following papers:
[1] Hafemann, Luiz G., Robert Sabourin, and Luiz S. Oliveira. "Learning Features for Offline Handwritten Signature Verification using Deep Convolutional Neural Networks" http://dx.doi.org/10.1016/j.patcog.2017.05.012 (preprint)
[2] Hafemann, Luiz G., Robert Sabourin, and Luiz S. Oliveira. "Fixed-sized representation learning from Offline Handwritten Signatures of different sizes" https://doi.org/10.1007/s10032-018-0301-6 (preprint)
If using any of the four datasets mentioned above, please cite the paper that introduced the dataset:
GPDS: Vargas, J.F., M.A. Ferrer, C.M. Travieso, and J.B. Alonso. 2007. “Off-Line Handwritten Signature GPDS-960 Corpus.” In Document Analysis and Recognition, 9th I nternational Conference on, 2:764–68. doi:10.1109/ICDAR.2007.4377018.
MCYT: Ortega-Garcia, Javier, J. Fierrez-Aguilar, D. Simon, J. Gonzalez, M. Faundez-Zanuy, V. Espinosa, A. Satue, et al. 2003. “MCYT Baseline Corpus: A Bimodal Biometric Database.” IEE Proceedings-Vision, Image and Signal Processing 150 (6): 395–401.
CEDAR: Kalera, Meenakshi K., Sargur Srihari, and Aihua Xu. 2004. “Offline Signature Verification and Identification Using Distance Statistics.” International Journal of Pattern Recognition and Artificial Intelligence 18 (7): 1339–60. doi:10.1142/S0218001404003630.
Brazilian PUC-PR: Freitas, C., M. Morita, L. Oliveira, E. Justino, A. Yacoubi, E. Lethelier, F. Bortolozzi, and R. Sabourin. 2000. “Bases de Dados de Cheques Bancarios Brasilei ros.” In XXVI Conferencia Latinoamericana de Informatica.
The source code is released under the BSD 2-clause license. Note that the trained models used the GPDS dataset for training (which is restricted for non-comercial use).