Open machine52vision opened 2 years ago
Hi, @machine52vision , have you solved this problem?
Hm. Correct me if i am wrong, but the net is not trained at all (just inference on pretrained wide_resnet50 to get embedding vectors), so no gradients have to be computed. That said, it doesn't matter if loss is NaN.
thanks a lot!
Hm. Correct me if i am wrong, but the net is not trained at all (just inference on pretrained wide_resnet50 to get embedding vectors), so no gradients have to be computed. That said, it doesn't matter if loss is NaN.
Hi. @SDJustus , I want to train my dataset with this code, not just inference. So I think it is matter if loss is Nan.
OK, so if you look at this code from train.py:\
for param in self.model.parameters():
\
param.requires_grad = False
you can see, that it is intended to not update model parameters during training. As you can read in the Paper, only the embeddings of a pretrained network are used to make further computations for a new dataset (such as minimax facility location and kNN for testing). \
So again, no network weight updates are done during training. So Loss NaN is totally fine here.
OK, so if you look at this code from train.py:
for param in self.model.parameters():
param.requires_grad = False
you can see, that it is intended to not update model parameters during training. As you can read in the Paper, only the embeddings of a pretrained network are used to make further computations for a new dataset (such as minimax facility location and kNN for testing). So again, no network weight updates are done during training. So Loss NaN is totally fine here.
OK, thanks, I know it.
Dig into the pl code pytorch_lightning\core\lightning.py, when prepare the dump info in each batch, there's such logic to assign the value to loss in function get_progress_bar_dict.
if running_train_loss is not None: avg_training_loss = running_train_loss.cpu().item() elif self.automatic_optimization: avg_training_loss = float('NaN')
check the definition automatic_optimization,
def automatic_optimization(self) -> bool: """ If False you are responsible for calling .backward, .step, zero_grad. """ return self._automatic_optimization
As there's no backward logic during trainning, automatic_optimization can be set to false to avoid set NaN to loss.
I've modified the function configure_optimizers in train.py, there's no loss=NaN printed anymore.
def configure_optimizers(self): self.automatic_optimization = False return None
hello,how solve loss=nan?