Closed boubakerwa closed 3 years ago
You are correct.
Alg line 8 should be: direction = grad / (grad.abs().max(-1, keepdim=True)[0] + self.epsilon)
After 1 step update, it requires a projection step. However, practically, there is no difference in terms of performance. Furthermore, the current impl is faster. Thank you for pointing it out. It is necessary to add a declaim.
Cheers,
trying to follow the SMART implementation, I fail to comprehend how the projection step in PGS algorithm on the epsilon max-norm ball around the noise is implemented. Please correct me if I'm wrong, but it should be done in this step (in SmartPerturbation):
noise, eff_noise = self._norm_grad(delta_grad, eff_grad=eff_delta_grad, sentence_level=self.norm_level)
where the norm_grad returns:
direction = grad / (grad.abs().max(-1, keepdim=True)[0] + self.epsilon)
Question:
Thank you for your support!
Cheers Wassim