Closed beduffy closed 5 years ago
Yes, that was a typo. I will accept a pull request that renames it into gae-lambda.
I took the hyper param from the universe starter repo from OpenAI. I’m not sure how well it was tuned.
On Mar 18, 2019, at 11:12 AM, Ben Duffy notifications@github.com wrote:
From the end of section 3 in the GAE paper: High-Dimensional Continuous Control Using Generalized Advantage Estimation https://arxiv.org/pdf/1506.02438.pdf
Taking γ < 1 introduces bias into the policy gradient estimate, regardless of the value function’s accuracy. On the other hand, λ < 1 introduces bias only when the value function is inaccurate. Empirically, we find that the best value of λ is much lower than the best value of γ, likely because λ introduces far less bias than γ for a reasonably accurate value function So my two questions are:
Why is lambda called "tau" in this repo? If they empirically find that that lambda is better if it is "much lower" than gamma discount, why is the default for lambda (tau) and gamma set to 1.0 and 0.99 respectively? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
Hey, that's good to hear. I can create this PR. I've been adapting your great code in another project and have been adding docstrings to some functions and classes e.g. this ASCII art architecture to the ActorCritic class:
Implementation of A3C (https://arxiv.org/abs/1602.01783).
____________________________________________________________________________________________________
A3C policy model architecture
Image Processing module -> Flattened output -> Policy Learning Module ---> Final output
______________________ ___________________________________
| _______ ____ | __ | ___________ |
image -> | 4x |conv2d| + |ELU| | |__| | | LSTM | --> Critic FC-> | -> value
| |______| |___| | -> |__| --> | --> |__________| --> Actor FC -> | -> policy logits
| | |__| | ^ ^ | -> (hx, cx)
| | |__| | | | |
|_____________________| | prev cx hx |
|__________________________________|
____________________________________________________________________________________________________
Processes an input image (with num_input_channels) with 4 conv layers,
interspersed with 4 elu activation functions. The output of the final layer is then flattened
and passed to an LSTM (with previous or initial hidden and cell states (hx and cx)).
The new hidden state is used as an input to the critic and value nn.Linear layer heads,
The final output is then the predicted value, action logits, hx and cx.
"""
I can add a few things like this if you want in the PR? But otherwise if you just want the name change, I can do that as well
I created the PR here with the typo fix: https://github.com/ikostrikov/pytorch-a3c/pull/61
If you would like the ASCII art and some other documentation, happy to create another PR.
Thanks for merging the PR!
I assume I'll sell my ascii art on the black market to the next highest bidder? :stuck_out_tongue: I'm happy to close this issue.
From the end of section 3 in the GAE paper: High-Dimensional Continuous Control Using Generalized Advantage Estimation https://arxiv.org/pdf/1506.02438.pdf
So my two questions are: