CLEVER score for ReLU networks

shawnclq commented 4 years ago

Hi there!

In reading your paper, I noticed that Lemma 3.3 is for networks with ReLU activations and it says that the Lipchitz constant used in Lemma 3.1 can be replaced by the maximum norm of directional derivative, which I don't quite understand.

The Lipchitz constant used in Lemma 3.1 is: Screenshot from 2020-10-19 10-53-37

So if p = 2 (p as used in the paper), q = 2. For ReLU networks, how should I proceed?

huanzhang12 commented 4 years ago

Hi @shawnclq,

The local Lipschitz constant for p-norm can be computed by taking the max of gradient dual norm over set S. However, for ReLU networks, strictly speaking, the gradient does not exist when the input of ReLU neuron is exactly 0. Thus we must use directional derivative instead of regular gradient to finish the proof.

Practically, people don't care about the singular point when the ReLU neuron is exactly 0. For ReLU networks we obtain the gradients of a ReLU network using Tensorflow/Pytorch just like everyone does. The use of directional derivative is mostly for theoretical completeness.

Hope this answers your question and let me know if you have any additional questions.

Huan

shawnclq commented 4 years ago

@huanzhang12 Thanks for prompt reply, I understand now!

huanzhang12 / CLEVER

CLEVER score for ReLU networks #3