Hi,

Thanks for your code.

I tried to use it for training TORCS, however, my result are not good and to be specific after a few steps, actions generated by Actor network increases to 1. and stay there. Similar to the following (for the top 10 for example):

[[ 1. 1. 1.] [ 1. 1. 1.] [ 1. 1. 1.] [ 1. 1. 1.] [ 1. 1. 1.] [ 1. 1. 1.] [ 1. 1. 1.] [ 1. 1. 1.] [ 1. 1. 1.] [ 1. 1. 1.]]

Gradients for that set: [[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05] [ 4.80426752e-05 1.51122265e-04 -1.96302353e-05] [ 4.80426752e-05 1.51122265e-04 -1.96302353e-05] [ 4.80426752e-05 1.51122265e-04 -1.96302353e-05] [ 4.80426752e-05 1.51122265e-04 -1.96302353e-05] [ 4.80426752e-05 1.51122265e-04 -1.96302353e-05] [ 4.80426752e-05 1.51122265e-04 -1.96302353e-05] [ 4.80426752e-05 1.51122265e-04 -1.96302353e-05] [ 4.80426752e-05 1.51122265e-04 -1.96302353e-05] [ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]]

I suspect the problem is some where around the following line:

Combine the gradients here

self.actor_gradients = tf.gradients(self.scaled_out, self.network_params, -self.action_gradient)

Could you tell me what do you think is the problem?

I am using tf 1.0.0 CPU version.

Thanks

pemami4911 / deep-rl

Actor network output increases to 1, TORCS, TF 1.0.0 #11

Combine the gradients here