mpc001 / auto_avsr

Auto-AVSR: Lip-Reading Sentences Project
Apache License 2.0
158 stars 40 forks source link

VSR Model Training Issues #22

Open jeonhuhuhu opened 9 months ago

jeonhuhuhu commented 9 months ago

We are learning the VSR model as it is without any modifications.

The code you proceed with during your study is as follows.

python train.py exp_dir=[exp_dir] \ exp_name=[exp_name] \ data.modality="video" \ data.dataset.root_dir=[root_dir] \ data.dataset.train_file="lrs3_train_transcript_lengths_seg24s.csv" \ data.dataset.val_file="lrs3_test_transcript_lengths_seg24s.csv" \ trainer.num_nodes="1" \ trainer.gpus="5" \ data.max_frames="1800" \ optimizer.lr="0.0002" \

: However, even after training the code several times, the values of "decoder_acc_step" and "decoder_acc_val" do not change when they exceed Epoch 30.

This means that the loss value does not drop.

Is there anything else important to set up when training in particular?

Thank you for your response in advance.

mpc001 commented 8 months ago

Hi @jeonhuhuhu,

The learning rate 0.0002 is only used when training the subset of lrs3, in that case, I think data.dataset.train_file should be set to lrs3_train_transcript_lengths_seg24s_0to100.csv. It's also worth noting that we've included the vsr_trlrs3_23h_base.pth in model zoo for your convenience. Please feel free to use the provided checkpoint If you wish to skip this step.

After that, for fine-tuning the pre-trained model on the complete LRS3 dataset, please pass the path to pretrained_model_path as an argument and set the learning rate to 0.001. I provide the command lines below:

python train.py exp_dir=[exp_dir] exp_name=[exp_name] data.modality="video" data.dataset.root_dir=[root_dir] data.dataset.train_file="lrs3_train_transcript_lengths_seg24s.csv" data.dataset.val_file="lrs3_test_transcript_lengths_seg24s.csv" trainer.num_nodes="1" trainer.gpus="5" data.max_frames="1800" optimizer.lr="0.001" pretrained_model_path=[model_path_of_vsr_trlrs3_23h_base]