Closed anianruoss closed 6 years ago
The problem is coming from the adversarial loss, which is 0 in your case as you point out (given the output of your sample_net
). So there is no trade-off between adversarial and flow loss, and in order to enforce a smooth flow the solution that is found is to have a constant flow, which produces NaN gradients because of the square root of the difference between a flow and its shifted version in flow_loss
. If you used flow_loss
with the argument padding_mode='CONSTANT'
you would obtain a different behavior (vanishing flow).
In any case I think it's just a toy example, you should solve your problem by changing the implementation of your sample_net
(to have it depend on the perturbed image).
Thank you for your detailed answer!
Glad it solved your issue. Just a couple quick extra comments for completeness:
If you can think of a more user-friendly treatment of this case let me know.
Yes, it would be nice to be able to initialize the solver with zero flows. I was able to fix the problem by adding a small epsilon to the norm in losses.py:
import sys
def _l2_diff_norm_squared(t1, t2, axis):
"""Shortcut for getting the squared L2 norm of the difference
between two tensors when slicing on the second axis.
"""
return tf.norm(
t1[:, axis] - t2[:, axis] + sys.float_info.epsilon,
ord='euclidean',
axis=(1, 2)
) ** 2
I think that this is a more elegant solution than clipping the argument. Do you think you could include it in your pip package?
Actually looking back at _l2_diff_norm_squared
made me realize that there is a difference in flow_loss
compared to Eq. (4) from arXiv:1801.02612: the summation over p (for looping over all pixels) is currently done inside of the square root, and not outside. So the results will be numerically different. Although the idea of enforcing local smoothness is present in the current implementation I have not implemented Eq. (4). Let me fix that ASAP and unit test it against a simple (non vectorized) calculation.
I have pushed modifications (see https://github.com/rakutentech/stAdv/commit/c7ebb7d39c3ae730b72d4e4c08a8c57d5666c1a3) and made it a version 0.2. You can upgrade with pip install -U stadv
. With the correct implementation of the flow loss the results (as found in the demo notebook) do not look very different. However, it exacerbates the problem of NaN gradients. Similar to the solution you have suggested, I have introduced an epsilon parameter to flow_loss
(with default value 1e-8) to prevent tf.sqrt(0)
.
I am closing this issue, thank you for pointing this out. Feel free to reopen if anything looks fishy!
Perfect, thank you for your help!
When running this simple example gradient_val in lbfgs starts to contain only NaN values after a certain number of iterations. This causes the lbfgs solver to terminate with the message "ABNORMAL_TERMINATION_IN_LNSRCH" and to output a loss of NaN value.
Using the TensorFlow Debugger I was able to pinpoint the problem to the
tf.sqrt
of the flow_loss. This can be verified by settingtau_val = 0
(essentially disabling the flow_loss), which leads to convergence and a loss of 0.Do you know how to fix this problem?