Closed YawYoung closed 4 years ago
I don't think you should add that. If you set adv_init_mag=0, then the perturbation for the first step will be 0 and it is equivalent to training on the clean data, since the embedding to be used is the sum of the perturbation and the clean sample's embeddings (see here).
I have released the hyperparameters for training the large model today. In some cases, setting adv_init_mag=0 seems to give better results. However, this is not always the case. Such random initializations are just meant for finding better solutions to the nonconvex inner max problem.
Thanks for your reply ~!
In this code, if adv_init_mag > 0, model will only be trained on adversarial examples? I did an experiment on SST-2 using albert-base-v2 with the hyper-parameters in this shell.
For
FreeLB with original data
, I added these code before this line(Maybe FreeLB's hyper parameters on albert-base is very different from albert-xxlarge?)