Closed joeoct93 closed 7 months ago
Hi, thanks for your interest! I'm not sure what could be causing this, as we do log those metrics in the validation step: https://github.com/sp-uhh/sgmse/blob/main/sgmse/model.py#L131
My first suspect would be the PyTorch Lightning or PyTorch versions, which might have changed some behavior. Did you install the exact versions from our requirements.txt here https://github.com/sp-uhh/sgmse/blob/main/requirements.txt ?
If so, the next thing to check could be (with a debugger) whether the checks here https://github.com/sp-uhh/sgmse/blob/main/sgmse/model.py#L127 succeed in your case or not, and if not, investigate why.
Finally, it could be that the method for logging errors if you don't have working dependencies for estoi or pesq. This might happen in a different thread so it wouldn't kill the training process, but still not log the metrics.
I tried to implement the the training, and I get the following warning after each epoch, and the validation epochs are skipped after these warnings:
/home/username/anaconda3/envs/sgmse3/lib/python3.8/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:378: UserWarning:
ModelCheckpoint(monitor='pesq')
could not find the monitored key in the returned metrics: ['train_loss', 'train_loss_step', 'train_loss_epoch', 'epoch', 'step']. HINT: Did you calllog('pesq', value)
in theLightningModule
? warning_cache.warn(m) /home/username/anaconda3/envs/sgmse3/lib/python3.8/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:378: UserWarning:ModelCheckpoint(monitor='si_sdr')
could not find the monitored key in the returned metrics: ['train_loss', 'train_loss_step', 'train_loss_epoch', 'epoch', 'step']. HINT: Did you calllog('si_sdr', value)
in theLightningModule
? warning_cache.warn(m)I've seen someone else implement this code before, and I know these warnings weren't there, there were validation epochs, and that the wandb showed metrics such as si_sdr. However, now they don't. I tried this after freshly downloading the code here, and I'm unsure what's wrong. I'm also worried that without the metrics, it will not train properly.