sp-uhh / sgmse

Score-based Generative Models (Diffusion Models) for Speech Enhancement and Dereverberation
MIT License
454 stars 69 forks source link

could not find the monitored key in the returned metrics #32

Closed joeoct93 closed 7 months ago

joeoct93 commented 12 months ago

I tried to implement the the training, and I get the following warning after each epoch, and the validation epochs are skipped after these warnings:

/home/username/anaconda3/envs/sgmse3/lib/python3.8/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:378: UserWarning: ModelCheckpoint(monitor='pesq') could not find the monitored key in the returned metrics: ['train_loss', 'train_loss_step', 'train_loss_epoch', 'epoch', 'step']. HINT: Did you call log('pesq', value) in the LightningModule? warning_cache.warn(m) /home/username/anaconda3/envs/sgmse3/lib/python3.8/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:378: UserWarning: ModelCheckpoint(monitor='si_sdr') could not find the monitored key in the returned metrics: ['train_loss', 'train_loss_step', 'train_loss_epoch', 'epoch', 'step']. HINT: Did you call log('si_sdr', value) in the LightningModule? warning_cache.warn(m)

I've seen someone else implement this code before, and I know these warnings weren't there, there were validation epochs, and that the wandb showed metrics such as si_sdr. However, now they don't. I tried this after freshly downloading the code here, and I'm unsure what's wrong. I'm also worried that without the metrics, it will not train properly.

cobalamin commented 12 months ago

Hi, thanks for your interest! I'm not sure what could be causing this, as we do log those metrics in the validation step: https://github.com/sp-uhh/sgmse/blob/main/sgmse/model.py#L131

My first suspect would be the PyTorch Lightning or PyTorch versions, which might have changed some behavior. Did you install the exact versions from our requirements.txt here https://github.com/sp-uhh/sgmse/blob/main/requirements.txt ?

If so, the next thing to check could be (with a debugger) whether the checks here https://github.com/sp-uhh/sgmse/blob/main/sgmse/model.py#L127 succeed in your case or not, and if not, investigate why.

Finally, it could be that the method for logging errors if you don't have working dependencies for estoi or pesq. This might happen in a different thread so it wouldn't kill the training process, but still not log the metrics.