Open mmuneeburahman opened 11 months ago
https://github.com/naya0000/cs231n/blob/e1192dc8cbaf078c3cfb691e12b8d6d2ec40c8fa/assignment1/cs231n/classifiers/linear_svm.py#L110 Can someone explain why this subtraction is done? An explanation for derivative calculation.
Please see the figure for the computational graph of hinge loss @mmuneeburahman.
Performing the backprop based on the computational graph, we will get the desire result as what the code is doing. The subtraction term comes from the part I have circled in red.
Code-wise, since $W$ (w
) is used to calculate both $\hat{Y}$ (Y_hat
) and $\mathbf{\hat{y}}$ (y_hat_true
), they both contribute to the derivative of $\frac{dL}{dW}$ (dW
) as you can see from this line:
margins = np.maximum(0, Y_hat - y_hat_true + 1)
By computing (margins > 0).sum(axis=1)
, we compute how many times W
was used to calculate y_hat_true
, i.e., how many times it contributed to the loss through y_hat_true
. We negate it because y_hat_true
is negative when computing margins
.
https://github.com/naya0000/cs231n/blob/e1192dc8cbaf078c3cfb691e12b8d6d2ec40c8fa/assignment1/cs231n/classifiers/linear_svm.py#L110 Can someone explain why this subtraction is done? An explanation for derivative calculation.