namisan / mt-dnn

Multi-Task Deep Neural Networks for Natural Language Understanding
MIT License
2.24k stars 412 forks source link

Projected Gradient Ascent Impleentation #224

Closed boubakerwa closed 3 years ago

boubakerwa commented 3 years ago

Cheers,

trying to follow the SMART implementation, I fail to comprehend how the projection step in PGS algorithm on the epsilon max-norm ball around the noise is implemented. Please correct me if I'm wrong, but it should be done in this step (in SmartPerturbation): noise, eff_noise = self._norm_grad(delta_grad, eff_grad=eff_delta_grad, sentence_level=self.norm_level)

where the norm_grad returns: direction = grad / (grad.abs().max(-1, keepdim=True)[0] + self.epsilon)

Question:

how is x/(MaxNorm(x)+eps.) a projection on the epsilon max-norm ball.

Thank you for your support!

Cheers Wassim

namisan commented 3 years ago

You are correct.

Alg line 8 should be: direction = grad / (grad.abs().max(-1, keepdim=True)[0] + self.epsilon)

After 1 step update, it requires a projection step. However, practically, there is no difference in terms of performance. Furthermore, the current impl is faster. Thank you for pointing it out. It is necessary to add a declaim.