SEN_CSLR

This repo holds codes of the paper: Self-Emphasizing Network for Continuous Sign Language Recognition.(AAAI 2023) [paper]

This repo is based on VAC (ICCV 2021). Many thanks for their great work!

Prerequisites

This project is implemented in Pytorch (>1.8). Thus please install Pytorch first.
ctcdecode==0.4 [parlance/ctcdecode]，for beam search decode.
sclite [kaldi-asr/kaldi], install kaldi tool to get sclite for evaluation. After installation, create a soft link toward the sclite: mkdir ./software ln -s PATH_TO_KALDI/tools/sctk-2.4.10/bin/sclite ./software/sclite
SeanNaren/warp-ctc for ctc supervision.

Implementation

The implementation for the SSEM (line 47) and TSEM (line 23) is given in ./modules/resnet.py.

They are then equipped with the BasicBlock in ResNet in line 93 ./modules/resnet.py.

We later found that a multi-scale architecture could perform on par with what we report in the paper for TSEM, and thus implement it as such.

Data Preparation

You can choose any one of following datasets to verify the effectiveness of SEN.

PHOENIX2014 dataset

Download the RWTH-PHOENIX-Weather 2014 Dataset [download link]. Our experiments based on phoenix-2014.v3.tar.gz.
After finishing dataset download, extract it. It is suggested to make a soft link toward downloaded dataset.
ln -s PATH_TO_DATASET/phoenix2014-release ./dataset/phoenix2014
The original image sequence is 210x260, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.
```
cd ./preprocess
python dataset_preprocess.py --process-image --multiprocessing
```

PHOENIX2014-T dataset

Download the RWTH-PHOENIX-Weather 2014 Dataset [download link]
After finishing dataset download, extract it. It is suggested to make a soft link toward downloaded dataset.
ln -s PATH_TO_DATASET/PHOENIX-2014-T-release-v3/PHOENIX-2014-T ./dataset/phoenix2014-T
The original image sequence is 210x260, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.
```
cd ./preprocess
python dataset_preprocess-T.py --process-image --multiprocessing
```

CSL dataset

Request the CSL Dataset from this website [download link]
After finishing dataset download, extract it. It is suggested to make a soft link toward downloaded dataset.
ln -s PATH_TO_DATASET ./dataset/CSL
The original image sequence is 1280x720, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.
```
cd ./preprocess
python dataset_preprocess-CSL.py --process-image --multiprocessing
```

CSL-Daily dataset

Request the CSL-Daily Dataset from this website [download link]
After finishing dataset download, extract it. It is suggested to make a soft link toward downloaded dataset.
ln -s PATH_TO_DATASET ./dataset/CSL-Daily
The original image sequence is 1280x720, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.
```
cd ./preprocess
python dataset_preprocess-CSL-Daily.py --process-image --multiprocessing
```

Inference

PHOENIX2014 dataset

Backbone	Dev WER	Test WER	Pretrained model
Baseline	21.2%	22.3%	---
ResNet18	19.5%	21.0%	[Baidu] (passwd: jnii) [Google Drive]

PHOENIX2014-T dataset

Backbone	Dev WER	Test WER	Pretrained model
Baseline	21.1%	22.8%	---
ResNet18	19.3%	20.7%	[Baidu] (passwd: kqhx) [Google Drive]

CSL-Daily dataset

Backbone	Dev WER	Test WER	Pretrained model
Baseline	32.8%	32.3%	---
ResNet18	31.1%	30.7%	[Baidu] (passwd: xkhu) [Google Drive]

To evaluate the pretrained model, run the command below：
python main.py --device your_device --load-weights path_to_weight.pt --phase test

Training

The priorities of configuration files are: command line > config file > default values of argparse. To train the SLR model, run the command below:

python main.py --device your_device

Note that you can choose the target dataset from phoenix2014/phoenix2014-T/CSL/CSL-Daily in line 3 in ./config/baseline.yaml.

For CSL-Daily dataset, You may choose to reduce the lr by half from 0.0001 to 0.00005, change the lr deacying rate (gamma in the 'optimizer.py') from 0.2 to 0.5, and disable the temporal resampling strategy (comment line 121 in dataloader_video.py).

Citation

If you find this repo useful in your research works, please consider citing:

@inproceedings{hu2023self,
  title={Self-Emphasizing Network for Continuous Sign Language Recognition},
  author={Hu, Lianyu and Gao, Liqing and Liu, Zekang and Feng, Wei},
  booktitle={Thirty-seventh AAAI conference on artificial intelligence},
  year={2023},
}

hulianyuyy / SEN_CSLR

readme

SEN_CSLR

Prerequisites

Implementation

Data Preparation

PHOENIX2014 dataset

PHOENIX2014-T dataset

CSL dataset

CSL-Daily dataset

Inference

PHOENIX2014 dataset

PHOENIX2014-T dataset

CSL-Daily dataset

Training

Citation