kevinzakka / recurrent-visual-attention

A PyTorch Implementation of "Recurrent Models of Visual Attention"
MIT License
469 stars 124 forks source link

Detaching l_t #29

Open Pozimek opened 4 years ago

Pozimek commented 4 years ago

At the moment the location tensor _lt is never detached from the computational graph in spite of both being produced by and 'consumed' by trainable modules. As far as I understand the code this enables the gradients to 'backpropagate through time' in a way that the authors of RAM did not intend: the gradients that originated in the _actionnetwork and reached the fc2 layer inside the glimpse network would travel back to the previous timestep's _locationnetwork and alter its weights and only stop once they reach the detached RNN memory vector _ht. As far as I understand the authors intended the _locationnetwork to only be trained using reinforcement learning.

This could be a bug or it could be an accidental improvement to the network; either way please let me know if my understanding is correct in here as I am still learning Pytorch and my project is heavily reliant on your code :)

yxiao54 commented 4 years ago

Yes agree. Same confusion. The author says:The location network is always trained with REINFORCE. So should we build another loss function?

lijiangguo commented 4 years ago

Note aside from stop at h_t, the gradient originated from action_network will continue recursively through g_t in core_network to modify all previous time l_t. Meanwhile, I wonder why location_network and baseline_network have to detach from h_t? Anywhere in the paper suggested core_network is only trained via classification loss? @Pozimek @yxiao54

litingfeng commented 3 years ago

@Pozimek it seems that l_t is detached in location network

lizhenstat commented 3 years ago

@Pozimek Hi, you explanation helps me understand why the authors use l_t.detach() in the code, thanks!