Closed TonyNemo closed 2 years ago
Unfortunately, the current version does not support training with multiple GPUs.
Hi, I tried and found that it seemed to work:
python3 -m torch.distributed.launch --nproc_per_node 7 $code_dir/run_glue_prune.py ........
At least it can run successfully, and the results don't look ridiculous. Is this ok? @xiamengzhou Thanks!
Oh wow, thanks for making it work! If it runs through and the results make sense, it should be correct :) Could you share more details on which setting you ran and how long it took to run?
Oh wow, thanks for making it work! If it runs through and the results make sense, it should be correct :) Could you share more details on which setting you ran and how long it took to run?
run on GLUE MNLI, batch_size_per_gpu 32, use 7 RTX 2080ti, so total_batch_size is 32 * 7 = 224, other settings are default, e.g. training 20 epochs, pre-finetuning epochs is 1, Sparsity warmup epochs is 2. It took 11660 seconds to run.
Oh wow, thanks for making it work! If it runs through and the results make sense, it should be correct :) Could you share more details on which setting you ran and how long it took to run?
The accuracy of my fine-tuning Bert is about 82%, and accuracy of CoFi pruning is 71% with 95% sparsity. Then I fine-tuning the model after pruning. The accuracy is 75%.
The accuracy is on the eval_dataset
Thanks! It seems that running with multi-gpu settings compromises the results a lot. Fine-tuning BERT with MNLI should achieve a >84 accuracy and a low accuracy could further affect the pruning accuracy. Do you have an idea about why it achieved a lower accuracy even on standard fine-tuning?
Another possibility of the performance decline is that the model went through an insufficient number of updates now that the batch size has significantly increased.
Hi, I am closing this issue. Feel free to reopen it if you have more questions :)
Unfortunately, the current version does not support training with multiple GPUs.