This repo holds codes of the paper: Self-Emphasizing Network for Continuous Sign Language Recognition.(AAAI 2023) [paper]
This repo is based on VAC (ICCV 2021). Many thanks for their great work!
This project is implemented in Pytorch (>1.8). Thus please install Pytorch first.
ctcdecode==0.4 [parlance/ctcdecode],for beam search decode.
sclite [kaldi-asr/kaldi], install kaldi tool to get sclite for evaluation. After installation, create a soft link toward the sclite:
mkdir ./software
ln -s PATH_TO_KALDI/tools/sctk-2.4.10/bin/sclite ./software/sclite
SeanNaren/warp-ctc for ctc supervision.
The implementation for the SSEM (line 47) and TSEM (line 23) is given in ./modules/resnet.py.
They are then equipped with the BasicBlock in ResNet in line 93 ./modules/resnet.py.
We later found that a multi-scale architecture could perform on par with what we report in the paper for TSEM, and thus implement it as such.
You can choose any one of following datasets to verify the effectiveness of SEN.
Download the RWTH-PHOENIX-Weather 2014 Dataset [download link]. Our experiments based on phoenix-2014.v3.tar.gz.
After finishing dataset download, extract it. It is suggested to make a soft link toward downloaded dataset.
ln -s PATH_TO_DATASET/phoenix2014-release ./dataset/phoenix2014
The original image sequence is 210x260, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.
cd ./preprocess
python dataset_preprocess.py --process-image --multiprocessing
Download the RWTH-PHOENIX-Weather 2014 Dataset [download link]
After finishing dataset download, extract it. It is suggested to make a soft link toward downloaded dataset.
ln -s PATH_TO_DATASET/PHOENIX-2014-T-release-v3/PHOENIX-2014-T ./dataset/phoenix2014-T
The original image sequence is 210x260, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.
cd ./preprocess
python dataset_preprocess-T.py --process-image --multiprocessing
Request the CSL Dataset from this website [download link]
After finishing dataset download, extract it. It is suggested to make a soft link toward downloaded dataset.
ln -s PATH_TO_DATASET ./dataset/CSL
The original image sequence is 1280x720, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.
cd ./preprocess
python dataset_preprocess-CSL.py --process-image --multiprocessing
Request the CSL-Daily Dataset from this website [download link]
After finishing dataset download, extract it. It is suggested to make a soft link toward downloaded dataset.
ln -s PATH_TO_DATASET ./dataset/CSL-Daily
The original image sequence is 1280x720, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.
cd ./preprocess
python dataset_preprocess-CSL-Daily.py --process-image --multiprocessing
Backbone | Dev WER | Test WER | Pretrained model |
---|---|---|---|
Baseline | 21.2% | 22.3% | --- |
ResNet18 | 19.5% | 21.0% | [Baidu] (passwd: jnii) [Google Drive] |
Backbone | Dev WER | Test WER | Pretrained model |
---|---|---|---|
Baseline | 21.1% | 22.8% | --- |
ResNet18 | 19.3% | 20.7% | [Baidu] (passwd: kqhx) [Google Drive] |
Backbone | Dev WER | Test WER | Pretrained model |
---|---|---|---|
Baseline | 32.8% | 32.3% | --- |
ResNet18 | 31.1% | 30.7% | [Baidu] (passwd: xkhu) [Google Drive] |
To evaluate the pretrained model, run the command below:
python main.py --device your_device --load-weights path_to_weight.pt --phase test
The priorities of configuration files are: command line > config file > default values of argparse. To train the SLR model, run the command below:
python main.py --device your_device
Note that you can choose the target dataset from phoenix2014/phoenix2014-T/CSL/CSL-Daily in line 3 in ./config/baseline.yaml.
For CSL-Daily dataset, You may choose to reduce the lr by half from 0.0001 to 0.00005, change the lr deacying rate (gamma in the 'optimizer.py') from 0.2 to 0.5, and disable the temporal resampling strategy (comment line 121 in dataloader_video.py).
If you find this repo useful in your research works, please consider citing:
@inproceedings{hu2023self,
title={Self-Emphasizing Network for Continuous Sign Language Recognition},
author={Hu, Lianyu and Gao, Liqing and Liu, Zekang and Feng, Wei},
booktitle={Thirty-seventh AAAI conference on artificial intelligence},
year={2023},
}