High validation perplexity when DP fine-tuning GPT2 in E2E

shininessNY commented 7 months ago

When I use DP to fine-tune GPT2 on the E2E dataset, I find that I get noise that is three to four orders of magnitude larger than the gradient in the setting of σ=0.6, resulting in getting very large perplexity, What could be the reason behind this?

gpt2_ft.py \
    --train_data ./data/e2e/train.jsonl \
    --valid_data ./data/e2e/valid.jsonl \
    --train_batch_size 8 \
    --grad_acc 1 \
    --noise_multiplier 0.6 \
    --max_grad_norm 1.0 \
    --valid_batch_size 4 \
    --seq_len 512 \
    --model_card gpt2.sm \
    --init_checkpoint ./pretrained_checkpoints/gpt2-pytorch_model.bin \
    --platform local \
    --clip 0.0 \
    --lr 0.0004 \
    --weight_decay 0.01 \
    --correct_bias \
    --adam_beta2 0.999 \
    --scheduler constant \
    --warmup_step 0 \
    --max_epoch 20 \
    --save_interval 1000 \
    --lora_dim 4 \
    --lora_alpha 32 \
    --lora_dropout 0.0 \
    --label_smooth 0.1 \
    --work_dir ./trained_models/GPT2_M/e2e \
    --random_seed 110

屏幕截图 2024-04-15 232055

lunan0320 commented 6 months ago

I also encountered the same problem as you, the noise generated was always several orders of magnitude larger than the data

shininessNY commented 6 months ago

I found that I used the wrong sentence when loading the pretrained weights after model initialization.


#the right one is :

lm_net.load_weight(torch.load(args.init_checkpoint))

#the wrong one is:

lm_net.load_state_dict(torch.load(args.init_checkpoint), strict=False)

huseyinatahaninan / Differentially-Private-Fine-tuning-of-Language-Models

High validation perplexity when DP fine-tuning GPT2 in E2E #3