pemami4911 / deep-rl

Collection of Deep Reinforcement Learning algorithms
MIT License
297 stars 193 forks source link

Actor network output increases to 1, TORCS, TF 1.0.0 #11

Open Amir-Ramezani opened 7 years ago

Amir-Ramezani commented 7 years ago

Hi,

Thanks for your code.

I tried to use it for training TORCS, however, my result are not good and to be specific after a few steps, actions generated by Actor network increases to 1. and stay there. Similar to the following (for the top 10 for example):

[[ 1. 1. 1.] [ 1. 1. 1.] [ 1. 1. 1.] [ 1. 1. 1.] [ 1. 1. 1.] [ 1. 1. 1.] [ 1. 1. 1.] [ 1. 1. 1.] [ 1. 1. 1.] [ 1. 1. 1.]]

Gradients for that set: [[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05] [ 4.80426752e-05 1.51122265e-04 -1.96302353e-05] [ 4.80426752e-05 1.51122265e-04 -1.96302353e-05] [ 4.80426752e-05 1.51122265e-04 -1.96302353e-05] [ 4.80426752e-05 1.51122265e-04 -1.96302353e-05] [ 4.80426752e-05 1.51122265e-04 -1.96302353e-05] [ 4.80426752e-05 1.51122265e-04 -1.96302353e-05] [ 4.80426752e-05 1.51122265e-04 -1.96302353e-05] [ 4.80426752e-05 1.51122265e-04 -1.96302353e-05] [ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]]

I suspect the problem is some where around the following line:

Combine the gradients here

self.actor_gradients = tf.gradients(self.scaled_out, self.network_params, -self.action_gradient)

Could you tell me what do you think is the problem?

I am using tf 1.0.0 CPU version.

Thanks

RICEVAGUE commented 5 years ago

Hi! I am very interested in this issue. so, could you tell me the details of your solution?