tonytan48 / KD-DocRE

Implementation of Document-level Relation Extraction with Knowledge Distillation and Adaptive Focal Loss
110 stars 20 forks source link

the program will always be killed. #12

Closed WilliamAntoniocrayon closed 2 years ago

WilliamAntoniocrayon commented 2 years ago

I'm curious that when I run the step 3, the program will always be killed without any hint. 1661068595(1)

WilliamAntoniocrayon commented 2 years ago

The program was killed in 39 epoch while the num_train_epochs is 50.0.Then when I set the num_train_epochs to 30,the program was killed in 26 epoch.

tonytan48 commented 2 years ago

Hi there, Thank you for your interest. I have encountered this bug elsewhere, and it may be due to the gpu memory leakage from pytorch. For this stage, I should have change the num_train_epochs to 20, because the max f1 score in this stage (the training corpus is over 100k) typically occurs at earlier epochs, should be lower than 20. You may check your best checkpoints from earlier epochs. Actually I should set an early-stopping mechanism to save some computing time, I will try to update this part in few days. Thanks again for bringing this up.

WilliamAntoniocrayon commented 2 years ago

Thank you for your advice. I look forward to your revised code and your future work.