microsoft / dp-transformers

Differentially-private transformers using HuggingFace and Opacus
MIT License
123 stars 22 forks source link

Incompatible libraries #37

Closed fangyiyu closed 1 year ago

fangyiyu commented 1 year ago

When I finetune GPT-2 with DP using fine-tune-dp.py in dp-transformers/research/synthetic-text-generation-with-DP. An error occurred: TypeError: read_csv() got an unexpected keyword argument 'mangle_dupe_cols'

Seems like the error is caused by the versioning of datasets, if I upgrade datasetsto the latest version (2.14.6), the error disappeared, but another error occurred: ValueError: math domain error

I checked the source code and believe that this error should come from math.log(1 / q - 1) in the prv_accountantpackage, which is a part of dp_transformers. I was trying to upgrade df_transformers, but df_transformersrequires datasets<=2.6.1,>=2.0.0, which will cause the TypeError mentioned before.

Any suggestion to avoid the incompatible libraries problem will be appreciated. Thank you!

huseyinatahaninan commented 1 year ago

Hi @fangyiyu, very recently we updated our repo, which now uses probably latest version of everything but indeed the research project synthetic-text-generation-with-DP is still based on the previous version. Though I don't think datasets package version is very critical for us in fine-tune-dp.py so it's okay to upgrade it.

In my opinion, the error you are getting from the prv_accountant package is totally independent of the datasets and most likely it has to do with your privacy parameters. If you could please let us know what you set as the target_epsilon (or noise multiplier) and your dataset size along with the effective batch size you'd like to use and the number of epochs, then we can see why we are getting a problem with prv_accountant package.

fangyiyu commented 1 year ago

Hi @huseyinatahaninan, thank you for the reply. I'm following the script for finetuning with DP on this page, though I only have one GPU instead of 8. Below is my script for finetuning, where you can see the target_epsilon is set to 4, training batch size is 32, validation batch size is 64. I'm using a very small dataset for preliminary experiment, the training dataset contains 100 instances with an average of 47 tokens in each instance, and the validation set contains 10 instances with the an average of 114 tokens in each instance.

python3 fine-tune-dp.py \
    --data_dir data \
    --output_dir output \
    --model_name gpt2 \
    --per_device_train_batch_size 32 \
    --gradient_accumulation_steps 16 \
    --evaluation_strategy epoch \
    --save_strategy epoch \
    --log_level info \
    --per_device_eval_batch_size 64 \
    --eval_accumulation_steps 1 \
    --seed 42 \
    --target_epsilon 4.0 \
    --per_sample_max_grad_norm 1.0 \
    --weight_decay 0.01 \
    --remove_unused_columns False \
    --num_train_epochs 50 \
    --logging_steps 10 \
    --max_grad_norm 0 \
    --sequence_len 128 \
    --learning_rate 0.0001 \
    --lr_scheduler_type constant \
    --dataloader_num_workers 2 \
    --disable_tqdm True \
    --load_best_model_at_end True \

Thank you for your time and hope to hear from you soon.

huseyinatahaninan commented 1 year ago

Hi @fangyiyu, please note that the effective batch size is (number of GPUs x per_device_train_batch_size x gradient_accumulation_steps) and since you set --per_device_train_batch_size 32 and --gradient_accumulation_steps 16 your effective batch size is 32*64, which is more than the training dataset that contains 100 instances. I'd suggest you pick the effective batch size at most 10% of the training data so perhaps you can set something like --per_device_train_batch_size 8 and --gradient_accumulation_steps 1 and let me know if you are still getting an error.

If your training dataset that contains 100 instances is only for preliminary experiment that's okay but note that for better privacy-utility trade-off, you'd rather use much larger training datasets.

fangyiyu commented 1 year ago

Thank you for the reply. I can successfully fine-tune GPT-2 using the latest dp-transformer library that implements PEFT now.