mantasu / cs231n

Shortest solutions for CS231n 2021-2024
263 stars 60 forks source link

svm_loss_vectorized Derivative #9

Open mmuneeburahman opened 11 months ago

mmuneeburahman commented 11 months ago

https://github.com/naya0000/cs231n/blob/e1192dc8cbaf078c3cfb691e12b8d6d2ec40c8fa/assignment1/cs231n/classifiers/linear_svm.py#L110 Can someone explain why this subtraction is done? An explanation for derivative calculation.

nhattan417 commented 11 months ago

https://github.com/naya0000/cs231n/blob/e1192dc8cbaf078c3cfb691e12b8d6d2ec40c8fa/assignment1/cs231n/classifiers/linear_svm.py#L110 Can someone explain why this subtraction is done? An explanation for derivative calculation.

Please see the figure for the computational graph of hinge loss @mmuneeburahman. SVM_HingeLoss

Performing the backprop based on the computational graph, we will get the desire result as what the code is doing. The subtraction term comes from the part I have circled in red.

mantasu commented 5 months ago

Code-wise, since $W$ (w) is used to calculate both $\hat{Y}$ (Y_hat) and $\mathbf{\hat{y}}$ (y_hat_true), they both contribute to the derivative of $\frac{dL}{dW}$ (dW) as you can see from this line:

margins = np.maximum(0, Y_hat - y_hat_true + 1)

By computing (margins > 0).sum(axis=1), we compute how many times W was used to calculate y_hat_true, i.e., how many times it contributed to the loss through y_hat_true. We negate it because y_hat_true is negative when computing margins.