hcw-00 / PatchCore_anomaly_detection

Unofficial implementation of PatchCore anomaly detection
Apache License 2.0
317 stars 95 forks source link

loss=nan #22

Open machine52vision opened 2 years ago

machine52vision commented 2 years ago

hello,how solve loss=nan?

XiaoPengZong commented 2 years ago

Hi, @machine52vision , have you solved this problem?

SDJustus commented 2 years ago

Hm. Correct me if i am wrong, but the net is not trained at all (just inference on pretrained wide_resnet50 to get embedding vectors), so no gradients have to be computed. That said, it doesn't matter if loss is NaN.

machine52vision commented 2 years ago

thanks a lot!

XiaoPengZong commented 2 years ago

Hm. Correct me if i am wrong, but the net is not trained at all (just inference on pretrained wide_resnet50 to get embedding vectors), so no gradients have to be computed. That said, it doesn't matter if loss is NaN.

Hi. @SDJustus , I want to train my dataset with this code, not just inference. So I think it is matter if loss is Nan.

SDJustus commented 2 years ago

OK, so if you look at this code from train.py:\ for param in self.model.parameters(): \ param.requires_grad = False you can see, that it is intended to not update model parameters during training. As you can read in the Paper, only the embeddings of a pretrained network are used to make further computations for a new dataset (such as minimax facility location and kNN for testing). \ So again, no network weight updates are done during training. So Loss NaN is totally fine here.

XiaoPengZong commented 2 years ago

OK, so if you look at this code from train.py: for param in self.model.parameters(): param.requires_grad = False you can see, that it is intended to not update model parameters during training. As you can read in the Paper, only the embeddings of a pretrained network are used to make further computations for a new dataset (such as minimax facility location and kNN for testing). So again, no network weight updates are done during training. So Loss NaN is totally fine here.

OK, thanks, I know it.

zhangjunli177 commented 2 years ago

Dig into the pl code pytorch_lightning\core\lightning.py, when prepare the dump info in each batch, there's such logic to assign the value to loss in function get_progress_bar_dict. if running_train_loss is not None: avg_training_loss = running_train_loss.cpu().item() elif self.automatic_optimization: avg_training_loss = float('NaN') check the definition automatic_optimization, def automatic_optimization(self) -> bool: """ If False you are responsible for calling .backward, .step, zero_grad. """ return self._automatic_optimization As there's no backward logic during trainning, automatic_optimization can be set to false to avoid set NaN to loss. I've modified the function configure_optimizers in train.py, there's no loss=NaN printed anymore. def configure_optimizers(self): self.automatic_optimization = False return None