yanpanlau / DDPG-Keras-Torcs

Using Keras and Deep Deterministic Policy Gradient to play TORCS
716 stars 267 forks source link

Unlearning After Completing One Lap #15

Open EndlessHuygen opened 7 years ago

EndlessHuygen commented 7 years ago

Hello, I have a problem which I don't know where to start on. The start of training is fine where the reward increases mostly with every episode, and the car gets further and further away before leaving the lane. However after the car gets good enough to complete a lap (on a simple track) it unlearns what it has learned. It gets very unstable and the steering outputs get saturated causing massive oscillation and the car to spin out of control very quickly. Does anyone one have any idea as to what could be the cause of this? I am using tensorflow itself rather than Keras. I am also using xavier initialization on my network weights. The network structures and activation functions are the same as that in the DDPG paper.

EndlessHuygen commented 7 years ago

I think this might be due to the differences between my critic implementation and yours. As I have substituted my actor and it works with your code. In your implementation of the critic 3 Q values are outputted. Whereas in mine only one is. Why is this done? As far as I am aware an action state pair (both of which are vectors) correspond to a scalar Q output. Thank you in advance!

Sophistt commented 6 years ago

@EndlessHuygen Hi, I also meet the same problem that the agent does not learn after it finished a lap. Have you solved this problem?