A non-technical question, I hope its OK to ask here in github...
I am working on continuous robot control problems and was wondering which approach you are following for the continuous branch. I guess it is the Advantage Actor-Critic (A3C) approach in the 2016 Mnih paper here. However, that method is actually not Q-Learning but a variation of a policy GD method. However, many variables in your controller code suggest that DeepQ learning is applied, so I am a bit confused. Could you confirm that the code tries to reproduce the A3C method in that paper?
Hi,
A non-technical question, I hope its OK to ask here in github...
I am working on continuous robot control problems and was wondering which approach you are following for the continuous branch. I guess it is the Advantage Actor-Critic (A3C) approach in the 2016 Mnih paper here. However, that method is actually not Q-Learning but a variation of a policy GD method. However, many variables in your controller code suggest that DeepQ learning is applied, so I am a bit confused. Could you confirm that the code tries to reproduce the A3C method in that paper?