Open kongds opened 2 years ago
Another problem is that the magnitude of delta_grad * self.step_size
is too small to influence the noise
during noise updating.
For example, delta_grad * self.step_size
is around ~ 1e-13, but the magnitude of noise
can be 1e-5.
So the code below seems to just update the noise
by using its norm without using delta_grad
.
https://github.com/namisan/mt-dnn/blob/ca896ef1f9de561f1741221d2c98b4d989e3ed19/mt_dnn/perturbation.py#L131-L134
Thank you for provided code of SMART. SMART uses the following code to get the embeddings, which is then used to get noisy embeddings and feed bert as
inputs_embeds
. https://github.com/namisan/mt-dnn/blob/ca896ef1f9de561f1741221d2c98b4d989e3ed19/mt_dnn/matcher.py#L124But for
inputs_embeds
in transformers, it should be the output ofbert.embeddings.word_embedding
notbert.embeddings
. Please refer to the following code forBertEmbedding.forward
: