Closed shininessNY closed 6 months ago
I also encountered the same problem as you, the noise generated was always several orders of magnitude larger than the data
I found that I used the wrong sentence when loading the pretrained weights after model initialization.
#the right one is :
lm_net.load_weight(torch.load(args.init_checkpoint))
#the wrong one is:
lm_net.load_state_dict(torch.load(args.init_checkpoint), strict=False)
When I use DP to fine-tune GPT2 on the E2E dataset, I find that I get noise that is three to four orders of magnitude larger than the gradient in the setting of σ=0.6, resulting in getting very large perplexity, What could be the reason behind this?