Is this method relevant for binary classification?

leleogere commented 1 year ago

Those adversarial gradients seem quite promising to get rid of this arbitrary baseline.

However, in the case of binary classification, would this method be relevant? There would be only one adversarial example, not multiple ones like in a multi-class problem. What are your thoughts about that?

pd90506 commented 1 year ago

This method is not specified to multi-class problems. In a binary case, there will be only 1 adversarial class, which leads this method to a form similar to IG, except that IG requires a straightline path and a starting reference point, this AGI method is still able to automatically find a path.

leleogere commented 1 year ago

Thank you for your answer, that is what I thought but I wasn't sure.

Anyway thank you for your work, I managed to get some pretty amazing results with this method that I could not get with classical IG.

One last question, is it normal that on the following lines the delta is not normalized? https://github.com/pd90506/AGI/blob/4b28c8b2ade86f739b1f3768a46728d97ba432b9/AGI_main.py#L89-L90

According to the algorithm below, I would expect something like delta = epsilon * (data_grad_adv / grad_lab_norm).sign() (especially as it seems that the variable grad_lab_norm is not used anywhere).

algorithm

There could be an issue when the gradient is zero, but it could be solved by clipping the norm to some small value:

delta = epsilon * (data_grad_adv / torch.clamp(grad_lab_norm, min=1e-8)).sign()

pd90506 commented 5 months ago

Thank you for your answer, that is what I thought but I wasn't sure.

Anyway thank you for your work, I managed to get some pretty amazing results with this method that I could not get with classical IG.

One last question, is it normal that on the following lines the delta is not normalized?

https://github.com/pd90506/AGI/blob/4b28c8b2ade86f739b1f3768a46728d97ba432b9/AGI_main.py#L89-L90

According to the algorithm below, I would expect something like delta = epsilon * (data_grad_adv / grad_lab_norm).sign() (especially as it seems that the variable grad_lab_norm is not used anywhere).

There could be an issue when the gradient is zero, but it could be solved by clipping the norm to some small value:
delta = epsilon * (data_grad_adv / torch.clamp(grad_lab_norm, min=1e-8)).sign()

Yes! You're correct. But since we only take the sign, it shouldn't cause any differences. Your suggestion should also work.

pd90506 / AGI

Is this method relevant for binary classification? #1