training error about qnli

iMountTai commented 2 years ago

Great job. However, when I train to the 3rd epoch in the QNLI task, I encounter the following problem, but the CoLA or Squad tasks do not encounter this problem. Do you have any suggestions? I will be very grateful!

The error may appear in the following code block in the file: https://github.com/princeton-nlp/CoFiPruning/blob/main/trainer/trainer.py,Lines 680-685

lagrangian_loss = None
if self.start_prune:
        lagrangian_loss, _, _ = \
                 self.l0_module.lagrangian_regularization(
                        self.global_step - self.prepruning_finetune_steps)
        loss += lagrangian_loss

horizon86 commented 2 years ago

Hello, I met the same error on my own code but not CoFi. You can set CUDA_LAUNCH_BLOCKING=1 to get more stacktrace to help debug :smile:

iMountTai commented 2 years ago

When layer_distill_version=3 no longer reports an error, it may be that some layers are lost.But another problem arises, when I fine-tune, the following code needs to be changed is this correct?

xiamengzhou commented 1 year ago

Which layer_distill_version did you use when you encountered the error? Sorry for the late reply, I am happy to help with debugging!

iMountTai commented 1 year ago

layer_distill_version=4,Now the problem is solved, thank you!

princeton-nlp / CoFiPruning

training error about qnli #27