hulianyuyy / SEN_CSLR

Self-Emphasizing Network for Continuous Sign Language Recognition (AAAI2023 Oral)
Apache License 2.0
41 stars 2 forks source link

SEN_CSLR

This repo holds codes of the paper: Self-Emphasizing Network for Continuous Sign Language Recognition.(AAAI 2023) [paper]

This repo is based on VAC (ICCV 2021). Many thanks for their great work!

Prerequisites

Implementation

The implementation for the SSEM (line 47) and TSEM (line 23) is given in ./modules/resnet.py.

They are then equipped with the BasicBlock in ResNet in line 93 ./modules/resnet.py.

We later found that a multi-scale architecture could perform on par with what we report in the paper for TSEM, and thus implement it as such.

Data Preparation

You can choose any one of following datasets to verify the effectiveness of SEN.

PHOENIX2014 dataset

  1. Download the RWTH-PHOENIX-Weather 2014 Dataset [download link]. Our experiments based on phoenix-2014.v3.tar.gz.

  2. After finishing dataset download, extract it. It is suggested to make a soft link toward downloaded dataset.
    ln -s PATH_TO_DATASET/phoenix2014-release ./dataset/phoenix2014

  3. The original image sequence is 210x260, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.

    cd ./preprocess
    python dataset_preprocess.py --process-image --multiprocessing

PHOENIX2014-T dataset

  1. Download the RWTH-PHOENIX-Weather 2014 Dataset [download link]

  2. After finishing dataset download, extract it. It is suggested to make a soft link toward downloaded dataset.
    ln -s PATH_TO_DATASET/PHOENIX-2014-T-release-v3/PHOENIX-2014-T ./dataset/phoenix2014-T

  3. The original image sequence is 210x260, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.

    cd ./preprocess
    python dataset_preprocess-T.py --process-image --multiprocessing

CSL dataset

  1. Request the CSL Dataset from this website [download link]

  2. After finishing dataset download, extract it. It is suggested to make a soft link toward downloaded dataset.
    ln -s PATH_TO_DATASET ./dataset/CSL

  3. The original image sequence is 1280x720, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.

    cd ./preprocess
    python dataset_preprocess-CSL.py --process-image --multiprocessing

CSL-Daily dataset

  1. Request the CSL-Daily Dataset from this website [download link]

  2. After finishing dataset download, extract it. It is suggested to make a soft link toward downloaded dataset.
    ln -s PATH_TO_DATASET ./dataset/CSL-Daily

  3. The original image sequence is 1280x720, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.

    cd ./preprocess
    python dataset_preprocess-CSL-Daily.py --process-image --multiprocessing

Inference

PHOENIX2014 dataset

Backbone Dev WER Test WER Pretrained model
Baseline 21.2% 22.3% ---
ResNet18 19.5% 21.0% [Baidu] (passwd: jnii)
[Google Drive]

PHOENIX2014-T dataset

Backbone Dev WER Test WER Pretrained model
Baseline 21.1% 22.8% ---
ResNet18 19.3% 20.7% [Baidu] (passwd: kqhx)
[Google Drive]

CSL-Daily dataset

Backbone Dev WER Test WER Pretrained model
Baseline 32.8% 32.3% ---
ResNet18 31.1% 30.7% [Baidu] (passwd: xkhu)
[Google Drive]

​ To evaluate the pretrained model, run the command below:
python main.py --device your_device --load-weights path_to_weight.pt --phase test

Training

The priorities of configuration files are: command line > config file > default values of argparse. To train the SLR model, run the command below:

python main.py --device your_device

Note that you can choose the target dataset from phoenix2014/phoenix2014-T/CSL/CSL-Daily in line 3 in ./config/baseline.yaml.

For CSL-Daily dataset, You may choose to reduce the lr by half from 0.0001 to 0.00005, change the lr deacying rate (gamma in the 'optimizer.py') from 0.2 to 0.5, and disable the temporal resampling strategy (comment line 121 in dataloader_video.py).

Citation

If you find this repo useful in your research works, please consider citing:

@inproceedings{hu2023self,
  title={Self-Emphasizing Network for Continuous Sign Language Recognition},
  author={Hu, Lianyu and Gao, Liqing and Liu, Zekang and Feng, Wei},
  booktitle={Thirty-seventh AAAI conference on artificial intelligence},
  year={2023},
}