Closed shenyehui closed 12 months ago
How do I set it to a larger epoch when 60 epochs are not enough?Thank you for your answer
I changed the line self.parser.add_argument('--nEpochs', type=int, default=100, help='number of epochs to train for') but it still stops with 60 epochs
Thanks.
I have been using your code, and I noticed that the best results are achieved around the third or fourth epoch during training. Is this because you have some special settings for training?
No, it depends on the dataset. So I use the val set to locate the best epoch.
How do I set it to a larger epoch when 60 epochs are not enough?Thank you for your answer
Use --nEpochs
to pass a customized epoch.
I changed the line self.parser.add_argument('--nEpochs', type=int, default=100, help='number of epochs to train for') but it still stops with 60 epochs.
When the performance has not improved for --patience
epochs, the training will terminate.
Thank you. I changed the line self.parser.add_argument('--nEpochs', type=int, default=100, help='number of epochs to train for') and performance improves at 57th epoch, but it still stops at 60th epoch
what is your command?
what is your command?
my command: python main.py --phase=train_stu --resume=logs/tri_train_tea_0714_005535/ckpt_best.pth.tar
It is because the--nEpochs
is overode by the ckpt. You may want to do a hack by adding options.nEpochs=100
after https://github.com/ramdrop/stun/blob/bda3537fcd3562c8f9f3e9a8e104789d960b48bd/main.py#L13
It is because the
--nEpochs
is overode by the ckpt. You may want to do a hack by addingoptions.nEpochs=100
afterhttps://github.com/ramdrop/stun/blob/bda3537fcd3562c8f9f3e9a8e104789d960b48bd/main.py#L13
Thanks for your patience, I'll try the method you suggested later on
I'm sorry, I've encountered another issue. While training the student network, I noticed that the loss function is printed as tensor(nan, device='cuda:0', grad_fn=
Yes, self.whole_training_data_loader and self.training_data_loader cannot be used interchangeably.
Yes, self.whole_training_data_loader and self.training_data_loader cannot be used interchangeably.
Thank you,But I see that you used the output of the teacher model as the target when training the student model, but I think using the data from training the teacher network together could improve the results,so this question arises
Great work! I have been using your code, and I noticed that the best results are achieved around the third or fourth epoch during training. Is this because you have some special settings for training?