FreeLB didn't use the original training samples?

YawYoung commented 4 years ago

In this code, if adv_init_mag > 0, model will only be trained on adversarial examples? I did an experiment on SST-2 using albert-base-v2 with the hyper-parameters in this shell.

experiment	result
No FreeLB	93.00
FreeLB	91.86
FreeLB with original data	93.46

For FreeLB with original data, I added these code before this line

inputs['inputs_embeds'] = embeds_init
inputs['dp_masks'] = dp_masks
outputs, dp_masks = model(**inputs)
loss = outputs[0] 
if args.n_gpu > 1:
    loss = loss.mean()  # mean() to average on multi-gpu parallel training
if args.gradient_accumulation_steps > 1:
    loss = loss / args.gradient_accumulation_steps
tr_loss += loss.item()
if args.fp16:
    with amp.scale_loss(loss, optimizer) as scaled_loss:
        scaled_loss.backward()
else:
    loss.backward()

(Maybe FreeLB's hyper parameters on albert-base is very different from albert-xxlarge?)

zhuchen03 commented 4 years ago

I don't think you should add that. If you set adv_init_mag=0, then the perturbation for the first step will be 0 and it is equivalent to training on the clean data, since the embedding to be used is the sum of the perturbation and the clean sample's embeddings (see here).

I have released the hyperparameters for training the large model today. In some cases, setting adv_init_mag=0 seems to give better results. However, this is not always the case. Such random initializations are just meant for finding better solutions to the nonconvex inner max problem.

YawYoung commented 4 years ago

Thanks for your reply ~!

zhuchen03 / FreeLB

FreeLB didn't use the original training samples? #10