graviraja / MLOps-Basics

MIT License
6.04k stars 1.01k forks source link

Is training happening? #14

Closed rohitgr7 closed 3 years ago

rohitgr7 commented 3 years ago

https://github.com/graviraja/MLOps-Basics/blob/403ce8d1ee77410e3baaa6e330b7cfde5dddf2c9/week_0_project_setup/model.py#L25-L28

here, the loss is not returned, is the model even training?

ravirajag commented 3 years ago

@rohitgr7 we are logging in to the logger. No need to return the loss unless you want perform some operation on the overall loss in an epoch. I have done that in theweek1 for validation step. Refer here: https://github.com/graviraja/MLOps-Basics/blob/main/week_1_wandb_logging/model.py. If you are returning the loss you can access it in training_epoch_end method.

rohitgr7 commented 3 years ago

@graviraja I checked it doesn't look in logged_metrics to check for loss and perform backprop. Getting this warning when nothing is returned from training_step: training_step returned None. If this was on purpose, ignore this warning...

Also in the docs it's mentioned that if nothing is returned then it will skip the corresponding training_step: https://pytorch-lightning.readthedocs.io/en/latest/common/lightning_module.html#training-step

A minimal example to reproduce: https://colab.research.google.com/drive/11qA_1RxcEcHkiY-Xn5EsOR8ZH0wG8O1j#scrollTo=AAtq1hwSmjKe

graviraja commented 3 years ago

Fixed it. Thank you @rohitgr7