Using len(train_loader.dataset) and len(valid_loader.dataset) will result in wrong values as they return size of 'train_data' (i.e 60000 images). This is the reason for such difference in train_loss and valid_loss.
'train_loader' actually has size of 0.8*train_data (i.e 48000 images) and valid_loader actually has size of 0.2*train_data (i.e 12000 images).
Hence calculate avg batch loss in each batch and divide by total batches. This works because all the batches in both the Dataloaders are of size 20(reason: 48000%20=0 and 12000%20=0)
Using len(train_loader.dataset) and len(valid_loader.dataset) will result in wrong values as they return size of 'train_data' (i.e 60000 images). This is the reason for such difference in train_loss and valid_loss. 'train_loader' actually has size of 0.8*train_data (i.e 48000 images) and valid_loader actually has size of 0.2*train_data (i.e 12000 images). Hence calculate avg batch loss in each batch and divide by total batches. This works because all the batches in both the Dataloaders are of size 20(reason: 48000%20=0 and 12000%20=0)