Closed AjayTalati closed 7 years ago
Got nearly working code in this thread,
Hi, any chance you could give me some advice? I'm still stuck trying to get this to work? Here's a post of my code
https://gist.github.com/AjayTalati/184fec867380f6fa22b9aa0951143dec
I keep getting this error,
File "main_single.py", line 174, in <module>
value_loss = value_loss + advantage.pow(2)
AttributeError: 'numpy.ndarray' object has no attribute 'pow'
I don't understand why advantage
has become a numpy
array instead of a torch.tensor
- it never occurred with the discrete action implementation?
Any ideas what I've got wrong?
Thanks a lot for your help,
Best,
Ajay
Closing this, as continuous functions are just a pain to approximate?
I will add continuous control later. I don't have time at the moment.
OK - cool - take your time :+1: I don't mean this as an A3C specific comment, or anything specific about your implementation.
It's just a general observation, (and perhaps a provable fact), that I've found discrete functions easier to approximate that continuous ones.
In terms of simple MLP theory, this is nice by Mhaskar and Poggio,
Learning Real and Boolean Functions: When Is Deep Better Than Shallow
Hello :)
I was wondering how to modify the code for continuous actions? So for example it could be compared with your naf implementation on openAI gym pendulum,
env = gym.envs.make("Pendulum-v0")
Here's how far I got,
and the code in main now looks like,
It breaks with the following changes in
train.py
though,Any idea how to get it working?
Thanks a lot for your help,
Best regards,
Ajay
Reference - Deepmind A3C's paper, https://arxiv.org/pdf/1602.01783.pdf Section 9 - Continuous Action Control Using the MuJoCo Physics Simulator
Picture from https://github.com/deeplearninc/relaax#distributed-a3c-architecture-with-continuous-actions