Projected Gradient Ascent Impleentation

namisan / mt-dnn

Multi-Task Deep Neural Networks for Natural Language Understanding

MIT License

2.24k stars 412 forks source link

Cheers,

trying to follow the SMART implementation, I fail to comprehend how the projection step in PGS algorithm on the epsilon max-norm ball around the noise is implemented. Please correct me if I'm wrong, but it should be done in this step (in SmartPerturbation): noise, eff_noise = self._norm_grad(delta_grad, eff_grad=eff_delta_grad, sentence_level=self.norm_level)

where the norm_grad returns: direction = grad / (grad.abs().max(-1, keepdim=True)[0] + self.epsilon)

Question:

how is x/(MaxNorm(x)+eps.) a projection on the epsilon max-norm ball.

Thank you for your support!

Cheers Wassim

namisan / mt-dnn

Projected Gradient Ascent Impleentation #224