Open shawnclq opened 4 years ago
Hi @shawnclq,
The local Lipschitz constant for p-norm can be computed by taking the max of gradient dual norm over set S. However, for ReLU networks, strictly speaking, the gradient does not exist when the input of ReLU neuron is exactly 0. Thus we must use directional derivative instead of regular gradient to finish the proof.
Practically, people don't care about the singular point when the ReLU neuron is exactly 0. For ReLU networks we obtain the gradients of a ReLU network using Tensorflow/Pytorch just like everyone does. The use of directional derivative is mostly for theoretical completeness.
Hope this answers your question and let me know if you have any additional questions.
Huan
@huanzhang12 Thanks for prompt reply, I understand now!
Hi there!
In reading your paper, I noticed that Lemma 3.3 is for networks with ReLU activations and it says that the Lipchitz constant used in Lemma 3.1 can be replaced by the maximum norm of directional derivative, which I don't quite understand.
The Lipchitz constant used in Lemma 3.1 is:
So if p = 2 (p as used in the paper), q = 2. For ReLU networks, how should I proceed?