rasbt / stat453-deep-learning-ss21

STAT 453: Intro to Deep Learning @ UW-Madison (Spring 2021)
MIT License
462 stars 278 forks source link

Small error in bias-computation in L08/code/softmax-regression_scratch.ipynb #5

Open alanakbik opened 2 years ago

alanakbik commented 2 years ago

Hello @rasbt,

first of all thanks for making all this material available online, as well as your video lectures! A really helpful resource!

A small issue and fix: The classic softmax regression implementation in L08/code/softmax-regression_scratch.ipynb has a small error in the bias computation (I think). Output for training (cell 8) gives the same weight for all bias terms:

Epoch: 049 | Train ACC: 0.858 | Cost: 0.484
Epoch: 050 | Train ACC: 0.858 | Cost: 0.481

Model parameters:
  Weights: tensor([[ 0.5582, -1.0240],
        [-0.5462,  0.0258],
        [-0.0119,  0.9982]])
  Bias: tensor([-1.2020e-08, -1.2020e-08, -1.2020e-08])

whereas the second implementation with nn.Module API gives different bias terms.

The problem lies in the torch.sum call in SoftmaxRegression1.backward: it computes a single sum over all biases which is later broadcast across all bias terms. You can fix this by changing

    def backward(self, x, y, probas):  
        grad_loss_wrt_w = -torch.mm(x.t(), y - probas).t()
        grad_loss_wrt_b = -torch.sum(y - probas)
        return grad_loss_wrt_w, grad_loss_wrt_b

to

    def backward(self, x, y, probas):  
        grad_loss_wrt_w = -torch.mm(x.t(), y - probas).t()
        grad_loss_wrt_b = -torch.sum(y - probas, dim=0)
        return grad_loss_wrt_w, grad_loss_wrt_b

it learns the toy problem a (very slight) bit better then.

bharnoufi commented 3 days ago

Hi @rasbt,

I would also like to add that in the "logistic-regression.ipynb" we are not "averaging" the compute of the gradient by the batch_size (/y.size(0)) as it is the case in the "softmax-regression_scratch.ipynb" example.

Thank you !