ikostrikov / pytorch-a3c

PyTorch implementation of Asynchronous Advantage Actor Critic (A3C) from "Asynchronous Methods for Deep Reinforcement Learning".
MIT License
1.23k stars 279 forks source link

GAE parameter name should be lambda not tau. And why is default 1.0? #60

Closed beduffy closed 5 years ago

beduffy commented 5 years ago

From the end of section 3 in the GAE paper: High-Dimensional Continuous Control Using Generalized Advantage Estimation https://arxiv.org/pdf/1506.02438.pdf

Taking γ < 1 introduces bias into the policy gradient estimate, regardless of the value function’s accuracy. 
On the other hand, λ < 1 introduces bias only when the value function is inaccurate.
Empirically, we find that the best value of λ is much lower than the best value of γ, 
likely because λ introduces far less bias than γ for a reasonably accurate value function

So my two questions are:

ikostrikov commented 5 years ago

Yes, that was a typo. I will accept a pull request that renames it into gae-lambda.

I took the hyper param from the universe starter repo from OpenAI. I’m not sure how well it was tuned.

On Mar 18, 2019, at 11:12 AM, Ben Duffy notifications@github.com wrote:

From the end of section 3 in the GAE paper: High-Dimensional Continuous Control Using Generalized Advantage Estimation https://arxiv.org/pdf/1506.02438.pdf

Taking γ < 1 introduces bias into the policy gradient estimate, regardless of the value function’s accuracy. On the other hand, λ < 1 introduces bias only when the value function is inaccurate. Empirically, we find that the best value of λ is much lower than the best value of γ, likely because λ introduces far less bias than γ for a reasonably accurate value function So my two questions are:

Why is lambda called "tau" in this repo? If they empirically find that that lambda is better if it is "much lower" than gamma discount, why is the default for lambda (tau) and gamma set to 1.0 and 0.99 respectively? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

beduffy commented 5 years ago

Hey, that's good to hear. I can create this PR. I've been adapting your great code in another project and have been adding docstrings to some functions and classes e.g. this ASCII art architecture to the ActorCritic class:

Implementation of A3C (https://arxiv.org/abs/1602.01783).

____________________________________________________________________________________________________

                            A3C policy model architecture

   Image Processing module -> Flattened output ->     Policy Learning Module    --->  Final output
           ______________________                ___________________________________
          |     _______    ____ |     __       |      ___________                 |
image ->  | 4x |conv2d| + |ELU| |    |__|      |     |   LSTM   | --> Critic FC-> | -> value
          |    |______|   |___| | -> |__| -->  | --> |__________| --> Actor FC -> | -> policy logits
          |                     |    |__|      |        ^   ^                     | -> (hx, cx)
          |                     |    |__|      |        |   |                     |
          |_____________________|              |  prev cx  hx                     |
                                               |__________________________________|
____________________________________________________________________________________________________

    Processes an input image (with num_input_channels) with 4 conv layers,
    interspersed with 4 elu activation functions. The output of the final layer is then flattened
    and passed to an LSTM (with previous or initial hidden and cell states (hx and cx)).
    The new hidden state is used as an input to the critic and value nn.Linear layer heads,
    The final output is then the predicted value, action logits, hx and cx.
    """

I can add a few things like this if you want in the PR? But otherwise if you just want the name change, I can do that as well

beduffy commented 5 years ago

I created the PR here with the typo fix: https://github.com/ikostrikov/pytorch-a3c/pull/61

If you would like the ASCII art and some other documentation, happy to create another PR.

beduffy commented 5 years ago

Thanks for merging the PR!

I assume I'll sell my ascii art on the black market to the next highest bidder? :stuck_out_tongue: I'm happy to close this issue.