Open alanakbik opened 2 years ago
Hi @rasbt,
I would also like to add that in the "logistic-regression.ipynb" we are not "averaging" the compute of the gradient by the batch_size (/y.size(0)) as it is the case in the "softmax-regression_scratch.ipynb" example.
Thank you !
Hello @rasbt,
first of all thanks for making all this material available online, as well as your video lectures! A really helpful resource!
A small issue and fix: The classic softmax regression implementation in
L08/code/softmax-regression_scratch.ipynb
has a small error in the bias computation (I think). Output for training (cell 8) gives the same weight for all bias terms:whereas the second implementation with nn.Module API gives different bias terms.
The problem lies in the
torch.sum
call inSoftmaxRegression1.backward
: it computes a single sum over all biases which is later broadcast across all bias terms. You can fix this by changingto
it learns the toy problem a (very slight) bit better then.