hehefan / Recurrent-Attention-Model

Tensorflow implementation of paper "Recurrent Models of Visual Attention"
51 stars 19 forks source link

Zero gradients in LocationNetwork #2

Open XoriieInpottn opened 6 years ago

XoriieInpottn commented 6 years ago

Thanks for your code!

I think use tf.stop_gradient() for both "mean_loc" and "sample_loc" causes the gradients of location network to be None. Here is the gradients information:

GlimpseNetwork/Variable:0 Tensor("gradients/AddN_29:0", shape=(64, 128), dtype=float32) GlimpseNetwork/Variable_1:0 Tensor("gradients/AddN_27:0", shape=(128,), dtype=float32) GlimpseNetwork/Variable_2:0 Tensor("gradients/AddN_30:0", shape=(2, 128), dtype=float32) GlimpseNetwork/Variable_3:0 Tensor("gradients/AddN_28:0", shape=(128,), dtype=float32) GlimpseNetwork/Variable_4:0 Tensor("gradients/AddN_25:0", shape=(128, 256), dtype=float32) GlimpseNetwork/Variable_5:0 Tensor("gradients/AddN_23:0", shape=(256,), dtype=float32) GlimpseNetwork/Variable_6:0 Tensor("gradients/AddN_26:0", shape=(128, 256), dtype=float32) GlimpseNetwork/Variable_7:0 Tensor("gradients/AddN_24:0", shape=(256,), dtype=float32) LocationNetwork/Variable:0 None LocationNetwork/Variable_1:0 None rnn_decoder/basic_lstm_cell/kernel:0 Tensor("gradients/AddN_22:0", shape=(512, 1024), dtype=float32) rnn_decoder/basic_lstm_cell/bias:0 Tensor("gradients/AddN_21:0", shape=(1024,), dtype=float32) Baseline/Variable:0 Tensor("gradients/AddN_3:0", shape=(256, 1), dtype=float32) Baseline/Variable_1:0 Tensor("gradients/AddN_1:0", shape=(1,), dtype=float32) Classification/Variable:0 Tensor("gradients/xw_plus_b_10/MatMul_grad/MatMul_1:0", shape=(256, 10), dtype=float32) Classification/Variable_1:0 Tensor("gradients/xw_plus_b_10_grad/BiasAddGrad:0", shape=(10,), dtype=float32)

So tf.stop_gradient(mean_loc) should be removed in "get_next_input()". Please check.

Many implementations in github such as: https://github.com/zhongwen/RAM https://github.com/jlindsey15/RAM https://github.com/hehefan/Recurrent-Attention-Model have the same problem.

clvcooke commented 5 years ago

Xoriielnpottn is correct. I would recommend anyone who is looking to use this repo move to a different implementation. This bug won't be clear until you try to visualize the glimpses and see that they are just random..