rarilurelo / pytorch_a3c

38 stars 4 forks source link

How to modify code for continuous actions? #5

Open ghost opened 7 years ago

ghost commented 7 years ago

Hi @rarilurelo,

can I ask if you have been able to modify your code to work with continuous actions - eg pendulum or mountain car? I tired to modify @ikostrikov 's implementation, see here

https://discuss.pytorch.org/t/continuous-action-a3c/1033

but could not get it too work? I think @pfre00 has tried too, but he said training was not stable, see here

https://github.com/pfre00/a3c/issues/1

Have you got any advice?

Kind regards,

Ajay

ghost commented 7 years ago

Hi, any chance you could give me some advice? I'm still stuck trying to get this to work? Here's a post of my code

https://gist.github.com/AjayTalati/184fec867380f6fa22b9aa0951143dec

I keep getting this error,

File "main_single.py", line 174, in <module>
value_loss = value_loss + advantage.pow(2)
AttributeError: 'numpy.ndarray' object has no attribute 'pow'

I don't understand why advantage has become a numpy array instead of a torch.tensor - it never occurred with the discrete action implementation?

Any ideas what I've got wrong?

Thanks a lot for your help,

Best,

Ajay

rarilurelo commented 7 years ago

@AjayTalati I don't know why this error occurs, but I can solve this problem by replacing L137 to rewards.append(float(max(min(reward, 1), -1))). (add float function)

I found another error in backpropagation about stochastic function. I suggest that you use reinforce method.

ghost commented 7 years ago

Hi @rarilurelo, thank you very much for your help :+1:

I will do as you suggest, and try to modify the code from the .reinforce example in the PyTorch examples,

https://github.com/pytorch/examples/blob/master/reinforcement_learning/reinforce.py

I wonder if you know of any examples of how to use .reinforce on batch problems? Perhaps something very simple/synthetic, that does not use a gym environment?