Open ghost opened 7 years ago
Hi, any chance you could give me some advice? I'm still stuck trying to get this to work? Here's a post of my code
https://gist.github.com/AjayTalati/184fec867380f6fa22b9aa0951143dec
I keep getting this error,
File "main_single.py", line 174, in <module>
value_loss = value_loss + advantage.pow(2)
AttributeError: 'numpy.ndarray' object has no attribute 'pow'
I don't understand why advantage
has become a numpy
array instead of a torch.tensor
- it never occurred with the discrete action implementation?
Any ideas what I've got wrong?
Thanks a lot for your help,
Best,
Ajay
@AjayTalati I don't know why this error occurs, but I can solve this problem by replacing L137 to rewards.append(float(max(min(reward, 1), -1))). (add float function)
I found another error in backpropagation about stochastic function. I suggest that you use reinforce method.
Hi @rarilurelo, thank you very much for your help :+1:
I will do as you suggest, and try to modify the code from the .reinforce
example in the PyTorch examples,
https://github.com/pytorch/examples/blob/master/reinforcement_learning/reinforce.py
I wonder if you know of any examples of how to use .reinforce
on batch problems? Perhaps something very simple/synthetic, that does not use a gym environment?
Hi @rarilurelo,
can I ask if you have been able to modify your code to work with continuous actions - eg pendulum or mountain car? I tired to modify @ikostrikov 's implementation, see here
https://discuss.pytorch.org/t/continuous-action-a3c/1033
but could not get it too work? I think @pfre00 has tried too, but he said training was not stable, see here
https://github.com/pfre00/a3c/issues/1
Have you got any advice?
Kind regards,
Ajay