huawei-noah / AdderNet

Code for paper " AdderNet: Do We Really Need Multiplications in Deep Learning?"
BSD 3-Clause "New" or "Revised" License
954 stars 187 forks source link

Equation (5) - partial derivative of the Euclidean norm #24

Closed andgitchang closed 4 years ago

andgitchang commented 4 years ago

Hi, I would like to know why you defined the L2-distance as in Equation (14) appendix. Doesn't L2-distance need a square root outside the summations? And also, I would like to know how the corresponding partial derivative of L2-distance in Equation (5) comes? Thanks.

HantingChen commented 4 years ago

We define the L2-distance to further investigate different metrics in neural networks. We still use L1 distance in AdderNets.

The partial derivative of L2-distance uses its originial derivative .

andgitchang commented 4 years ago

I know you use L1 distance in forward pass and full-precision L2 derivative in backward optimization. But my question is

  1. Considering L2 distance (see Definition), don't we need an extra sqrt outside the summations of Eq.(14) in your CVPR2020 supp?
  2. Following the def of L2 distance, shouldn't its derivative Eq.(5) in AdderNets be like \partial ||x||_2 = x / ||x||2 ? (please refer to p-norm subsection under [Examples](https://en.wikipedia.org/wiki/Norm(mathematics)#Examples))

If I have misunderstood anything, please correct me. Thanks.

HantingChen commented 4 years ago
  1. Yes, so we finally use the L1-AdderNet in our main paper. The L2-AdderNet is proposed only for investigation.

  2. We use the L2^2 distance in fact, as defined in our supp.

andgitchang commented 4 years ago

Thanks for your detailed explanation