mpc001 / auto_avsr

Auto-AVSR: Lip-Reading Sentences Project
Apache License 2.0
158 stars 40 forks source link

How to train an auto-avsr model from scratch through curriculum learning #5

Closed sara-kkk closed 1 year ago

sara-kkk commented 1 year ago

Thank you for sharing the code.

I am interested in training a visual-only model from scratch on the LRS2 dataset, using curriculum learning. I want to know the optimal learning rate and the number of epochs for training the model using a subset of LRS2 that includes only short utterances lasting no more than 4 seconds (100 frames). Could you provide details on how you trained the visual-only model available in the model zoo using only the LRS3 dataset (438 hours)?

mpc001 commented 1 year ago

Hi @sara-kkk, for LRS3 (438 hours), I start by training with short utterances (100 frames) using a learning rate of 0.0002 for 75 epochs. Then, I load the weights for fine-tuning on the whole LRS3 using a learning rate of 0.001 for 75 epochs.