Overfitting occurs when training transformer

Ivvvvvvvvvvy commented 8 months ago

Hello, recently I am training specvqgan with a small data set (only 2000 10s audio and video pairs), and when I use the pre-trained model for fine-tuning, the codebook results are very good. But when I used the vas pre-trained model to fine-tune the transformer, serious overfitting occurred. val/loss began to rise after falling to 2.71. When I changed the parameter dropout in first_stage_permuter_config (tried 0.3 and 0.6), It has no impact on the val/loss of the model. May I ask which parameters in the transformer.yaml file should I modify to alleviate overfitting?

v-iashin commented 8 months ago

hi. i don't have any specific advice for you here. you can study the ways that are usually used to combat overfitting in machine learning.

at the same time, i don't think using such high values of dropout would help.

the transformer is gpt-2 (300M) which is a lot of parameters for your dataset, tbh. maybe you can try smaller variants of gpt-2

v-iashin commented 8 months ago

in my experience, the loss overfits quickly in the second stage. so it didn't come as a surprise to me

v-iashin / SpecVQGAN

Overfitting occurs when training transformer #44