Open Pozimek opened 4 years ago
Yes agree. Same confusion. The author says:The location network is always trained with REINFORCE. So should we build another loss function?
Note aside from stop at h_t, the gradient originated from action_network will continue recursively through g_t in core_network to modify all previous time l_t. Meanwhile, I wonder why location_network and baseline_network have to detach from h_t? Anywhere in the paper suggested core_network is only trained via classification loss? @Pozimek @yxiao54
@Pozimek it seems that l_t is detached in location network
@Pozimek Hi, you explanation helps me understand why the authors use l_t.detach() in the code, thanks!
At the moment the location tensor _lt is never detached from the computational graph in spite of both being produced by and 'consumed' by trainable modules. As far as I understand the code this enables the gradients to 'backpropagate through time' in a way that the authors of RAM did not intend: the gradients that originated in the _actionnetwork and reached the fc2 layer inside the glimpse network would travel back to the previous timestep's _locationnetwork and alter its weights and only stop once they reach the detached RNN memory vector _ht. As far as I understand the authors intended the _locationnetwork to only be trained using reinforcement learning.
This could be a bug or it could be an accidental improvement to the network; either way please let me know if my understanding is correct in here as I am still learning Pytorch and my project is heavily reliant on your code :)