v-iashin / SpecVQGAN

Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)
https://v-iashin.github.io/SpecVQGAN
MIT License
347 stars 40 forks source link

Issues with training transformer on the VAS dataset #25

Closed mayqinxu closed 1 year ago

mayqinxu commented 1 year ago

Hi Vladimir, thanks for the great project / repo!

I'm having issues with using the train.py to train the transformer on the VAS dataset (Resnet50 1Feat). Every time I train, the process stops at the 5th epoch. It seems to have triggered early stopping. Is this normal or I should check my config file?

v-iashin commented 1 year ago

Hi, as far as I remember, it is ok. Transformer training should be much faster than autoencoder training. To confirm this, inspect your validation loss which should start getting into “u” shape

mayqinxu commented 1 year ago

Hi, as far as I remember, it is ok. Transformer training should be much faster than autoencoder training. To confirm this, inspect your validation loss which should start getting into “u” shape

Thanks a lot!

v-iashin commented 1 year ago

You may also look in Section 6.1.4 second paragraph for details (https://arxiv.org/pdf/2110.08791.pdf).