Closed andgitchang closed 4 years ago
We define the L2-distance to further investigate different metrics in neural networks. We still use L1 distance in AdderNets.
The partial derivative of L2-distance uses its originial derivative .
I know you use L1 distance in forward pass and full-precision L2 derivative in backward optimization. But my question is
If I have misunderstood anything, please correct me. Thanks.
Yes, so we finally use the L1-AdderNet in our main paper. The L2-AdderNet is proposed only for investigation.
We use the L2^2 distance in fact, as defined in our supp.
Thanks for your detailed explanation
Hi, I would like to know why you defined the L2-distance as in Equation (14) appendix. Doesn't L2-distance need a square root outside the summations? And also, I would like to know how the corresponding partial derivative of L2-distance in Equation (5) comes? Thanks.