Closed carabnuu closed 1 year ago
Hi carabnuu, Thanks for feedback! 1) The gradient norm can be approximated by numerical differential. So we can use norm(f(x1)-f(x2)) / norm(x1-x2) to replace gradient norm 2) You can use any norm function actually. In our code we use L1-norm (abs) because it is faster to compute. Feel free to use L2-norm as in the paper, or any Lp-norm you would like.
thanks for your answering!That really helps a lot! but I still have a question about your answer 1 regrading step 4 in algorithm 1. This step uses numerical differential to replace the gradient of input x. But from the expression in the paper and 'torch.abs(output - mixup_output)' in the code as well,I can only see Lp-norm(f(x1)-f(x2)) , and don't have the denominator part norm(x1-x2). I don't know why. I might be stuck by this simple question. Thanks again!
norm(x1-x2) is nearly a constant because x2 = x1 + epsilon. In high dimentional space, the norm of a random Gaussian is nearly a constant.
Thank you for your detailed reply !
hi , the work is excellent! I'm curious and have two questions about computation of zen score.