princeton-nlp / CoFiPruning

[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408
MIT License
192 stars 31 forks source link

training error about qnli #27

Closed iMountTai closed 1 year ago

iMountTai commented 2 years ago

Great job. However, when I train to the 3rd epoch in the QNLI task, I encounter the following problem, but the CoLA or Squad tasks do not encounter this problem. Do you have any suggestions? I will be very grateful!

image The error may appear in the following code block in the file: https://github.com/princeton-nlp/CoFiPruning/blob/main/trainer/trainer.py,Lines 680-685

lagrangian_loss = None
if self.start_prune:
        lagrangian_loss, _, _ = \
                 self.l0_module.lagrangian_regularization(
                        self.global_step - self.prepruning_finetune_steps)
        loss += lagrangian_loss
horizon86 commented 2 years ago

Hello, I met the same error on my own code but not CoFi. You can set CUDA_LAUNCH_BLOCKING=1 to get more stacktrace to help debug :smile:

iMountTai commented 2 years ago

When layer_distill_version=3 no longer reports an error, it may be that some layers are lost.But another problem arises, when I fine-tune, the following code needs to be changed image is this correct?

xiamengzhou commented 2 years ago

Which layer_distill_version did you use when you encountered the error? Sorry for the late reply, I am happy to help with debugging!

iMountTai commented 1 year ago

layer_distill_version=4,Now the problem is solved, thank you!