kevinzakka / recurrent-visual-attention

A PyTorch Implementation of "Recurrent Models of Visual Attention"
MIT License
467 stars 125 forks source link

how negative numbers affect gradient descent. #31

Open yxiao54 opened 4 years ago

yxiao54 commented 4 years ago

The loss may be negative number in the model. The reason is that the reinforce loss is often to be a negative number since the reward is the larger the better. But I am very confusing about how negative numbers affect gradient descent.

I also notice that the hybrid loss tend to be zero eventually. How can loss increase with gradient descent?

malashinroman commented 4 years ago

This is a standard approach to use negative loss values in reinforcement learning to turn gradient descent into gradient ascent. Minimizing negative loss is maximizing the same loss without minus sign. To my knowledge there is no issues in pytorch with this.

In A3C algorithm (used in this project) the loss can increase during training. The reason is that reinforcement loss is measured as the advantage over baseline prediction. Baseline is network that is learned during training and in the start of the training, it's prediction are poor and it's very easy to have an advantage over it. At least this how I see what is going on here.

litingfeng commented 3 years ago

@malashinroman Hi, May I ask why is this an A3C algorithm?

To me, all the images in a batch share the same agent. And the update is not asynchronous. While in A3C, the agents are different in different processes, and they update asynchronously to the central network. Please let me know if I'm wrong. I'm new to RL. Thanks!

malashinroman commented 3 years ago

I think you're right. I thought about different environments, but there are no asynchronous agents

Roman среда, 17 марта 2021г., 22:50 +03:00 от litingfeng @.*** :

@.*** Hi, May I ask why is this an A3C algorithm?

To me, all the images in a batch share the same agent. And the update is not asynchronous. While in A3C, the agents are different in different processes, and they update asynchronously to the central network. Please let me know if I'm wrong. I'm new to RL. Thanks! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub , or unsubscribe .