hulianyuyy / CorrNet

Continuous Sign Language Recognition with Correlation Network (CVPR 2023)
84 stars 14 forks source link

CorrNet_CSLR

This repo holds codes of the paper: Continuous Sign Language Recognition with Correlation Network. (CVPR 2023) [paper]

This repo is based on VAC (ICCV 2021). Many thanks for their great work!

(Update on 2024/04/17) We release CorrNet+, an unified model with superior performance on both continuous sign language recognition and sign language translation tasks by using only RGB inputs.

Prerequisites

Implementation

The implementation for the CorrNet (line 18) is given in ./modules/resnet.py.

It's then equipped with the BasicBlock in ResNet in line 58 ./modules/resnet.py.

We later found that the Identification Module with only spatial decomposition could perform on par with what we report in the paper (spatial-temporal decomposition) and is slighter faster, and thus implement it as such.

Data Preparation

You can choose any one of following datasets to verify the effectiveness of CorrNet.

PHOENIX2014 dataset

  1. Download the RWTH-PHOENIX-Weather 2014 Dataset [download link]. Our experiments based on phoenix-2014.v3.tar.gz.

  2. After finishing dataset download, extract it. It is suggested to make a soft link toward downloaded dataset.
    ln -s PATH_TO_DATASET/phoenix2014-release ./dataset/phoenix2014

  3. The original image sequence is 210x260, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.

    cd ./preprocess
    python data_preprocess.py --process-image --multiprocessing

PHOENIX2014-T dataset

  1. Download the RWTH-PHOENIX-Weather 2014 Dataset [download link]

  2. After finishing dataset download, extract it. It is suggested to make a soft link toward downloaded dataset.
    ln -s PATH_TO_DATASET/PHOENIX-2014-T-release-v3/PHOENIX-2014-T ./dataset/phoenix2014-T

  3. The original image sequence is 210x260, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.

    cd ./preprocess
    python data_preprocess-T.py --process-image --multiprocessing

If you get an error like IndexError: list index out of range on the PHOENIX2014-T dataset, you may refer to this issue to tackle the problem.

CSL dataset

  1. Request the CSL Dataset from this website [download link]

  2. After finishing dataset download, extract it. It is suggested to make a soft link toward downloaded dataset.
    ln -s PATH_TO_DATASET ./dataset/CSL

  3. The original image sequence is 1280x720, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.

    cd ./preprocess
    python data_preprocess-CSL.py --process-image --multiprocessing

CSL-Daily dataset

  1. Request the CSL-Daily Dataset from this website [download link]

  2. After finishing dataset download, extract it. It is suggested to make a soft link toward downloaded dataset.
    ln -s PATH_TO_DATASET ./dataset/CSL-Daily

  3. The original image sequence is 1280x720, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.

    cd ./preprocess
    python data_preprocess-CSL-Daily.py --process-image --multiprocessing

Inference

PHOENIX2014 dataset

Backbone Dev WER Test WER Pretrained model
ResNet18 18.8% 19.4% [Baidu] (passwd: skd3)
[Google Drive]

We wrongly delete the original checkpoint and retrain the model with similar accuracy (Dev: 18.9%, Test: 19.7%)

PHOENIX2014-T dataset

Backbone Dev WER Test WER Pretrained model
ResNet18 18.9% 20.5% [Baidu] (passwd: deuq)
[Google Drive]

CSL-Daily dataset

To evaluate upon CSL-Daily with this checkpoint, you should remove the CorrNet block after layer2, i.e., comment line 102 and 145 in resnet.py and change the num from 3 to 2 in line 105, change self.alpha[1] & self.alpha[2] to self.alpha[0] & self.alpha[1] in line 147 & 149, respectively.

Backbone Dev WER Test WER Pretrained model
ResNet18 30.6% 30.1% [Baidu] (passwd: u2iv)
[Google Drive]

​ To evaluate the pretrained model, choose the dataset from phoenix2014/phoenix2014-T/CSL/CSL-Daily in line 3 in ./config/baseline.yaml first, and run the command below:
python main.py --config ./config/baseline.yaml --device your_device --load-weights path_to_weight.pt --phase test

Training

The priorities of configuration files are: command line > config file > default values of argparse. To train the SLR model, run the command below:

python main.py --config ./config/baseline.yaml --device your_device

Note that you can choose the target dataset from phoenix2014/phoenix2014-T/CSL/CSL-Daily in line 3 in ./config/baseline.yaml.

For CSL-Daily dataset, You may choose to reduce the lr by half from 0.0001 to 0.00005, change the lr deacying rate (gamma in the 'optimizer.py') from 0.2 to 0.5, and disable the temporal resampling strategy (comment line 121 in dataloader_video.py).

Visualizations

For Grad-CAM visualization, you can replace the resnet.py under "./modules" with the resnet.py under "./weight_map_generation", and then run python generate_cam.py with your own hyperparameters.

Citation

If you find this repo useful in your research works, please consider citing:

@inproceedings{hu2023continuous,
  title={Continuous Sign Language Recognition with Correlation Network},
  author={Hu, Lianyu and Gao, Liqing and Liu, Zekang and Feng, Wei},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2023},
}