kevinzakka / recurrent-visual-attention

A PyTorch Implementation of "Recurrent Models of Visual Attention"
MIT License
468 stars 123 forks source link

convert torch.tensor to numpy #4

Closed ghost closed 4 years ago

ghost commented 6 years ago

Nice code. But

  1. I think where you conver torch.tensor to numpy, where gradient backpropagation broken.
  2. use_gpu do not works.
kevinzakka commented 6 years ago

Hey @0-oo-0, I'm aware of @2 and have not been able to fix it yet because I don't currently own a GPU and have finished my AWS credits. It's just a problem of applying .cpu() before numpy().

As for number 1, the glimpse extraction operation is non-differentiable anyway, so there shouldn't be a need to back propagate the gradients to it. We want to back propagate the gradients up to the glimpse network which transforms the glimpse and the location vector into the glimpse feature vector g_t.

Correct me if I'm wrong though.

ghost commented 6 years ago

Yeah, I think it is fine. I have two suggestions and a question. Suggestion 1. The experiment results in paper of final version is different from the arxiv version. default https://papers.nips.cc/paper/5542-recurrent-models-of-visual-attention.pdf Suggestion 2. By replacing F.tanh() at https://github.com/kevinzakka/recurrent-visual-attention/blob/5a73041b18a2dc07b71193cfb55ae40be0eda42b/modules.py#L344 and https://github.com/kevinzakka/recurrent-visual-attention/blob/5a73041b18a2dc07b71193cfb55ae40be0eda42b/modules.py#L354 with F.hardtanh() , I get 1.12% test error. Question 1. But at test time, l_t = mu, which suggested by Torch Blog Post on RAM, results in a worse result. I do not konw why.

kevinzakka commented 6 years ago

@0-oo-0 hey, not sure I understand. What's the issue exactly?

Your suggestion is that using hard_tanh reduces the test error further is that correct?

ghost commented 6 years ago

Yes, I update the comment.