princeton-nlp / CoFiPruning

[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408
MIT License
188 stars 32 forks source link

It is possible to run pruning on multiple GPUs? #16

Closed TonyNemo closed 2 years ago

xiamengzhou commented 2 years ago

Unfortunately, the current version does not support training with multiple GPUs.

horizon86 commented 2 years ago

Unfortunately, the current version does not support training with multiple GPUs.

Hi, I tried and found that it seemed to work:

python3 -m torch.distributed.launch --nproc_per_node 7 $code_dir/run_glue_prune.py ........

At least it can run successfully, and the results don't look ridiculous. Is this ok? @xiamengzhou Thanks!

xiamengzhou commented 2 years ago

Oh wow, thanks for making it work! If it runs through and the results make sense, it should be correct :) Could you share more details on which setting you ran and how long it took to run?

horizon86 commented 2 years ago

Oh wow, thanks for making it work! If it runs through and the results make sense, it should be correct :) Could you share more details on which setting you ran and how long it took to run?

run on GLUE MNLI, batch_size_per_gpu 32, use 7 RTX 2080ti, so total_batch_size is 32 * 7 = 224, other settings are default, e.g. training 20 epochs, pre-finetuning epochs is 1, Sparsity warmup epochs is 2. It took 11660 seconds to run.

horizon86 commented 2 years ago

Oh wow, thanks for making it work! If it runs through and the results make sense, it should be correct :) Could you share more details on which setting you ran and how long it took to run?

The accuracy of my fine-tuning Bert is about 82%, and accuracy of CoFi pruning is 71% with 95% sparsity. Then I fine-tuning the model after pruning. The accuracy is 75%.

The accuracy is on the eval_dataset

xiamengzhou commented 2 years ago

Thanks! It seems that running with multi-gpu settings compromises the results a lot. Fine-tuning BERT with MNLI should achieve a >84 accuracy and a low accuracy could further affect the pruning accuracy. Do you have an idea about why it achieved a lower accuracy even on standard fine-tuning?

xiamengzhou commented 2 years ago

Another possibility of the performance decline is that the model went through an insufficient number of updates now that the batch size has significantly increased.

xiamengzhou commented 2 years ago

Hi, I am closing this issue. Feel free to reopen it if you have more questions :)