Closed sarvghotra closed 8 years ago
tanh function at the output layer returns a value from -1 to +1 . The action_space of the specific environment used for testing (inverted pendulum) ranges from -3 to +3 . So, 3 was multiplied with the final tanh ouput layer.
Now, I have generalized with a action_bound variable
Could you please explain this 3 in this line https://github.com/stevenpjg/ddpg-aigym/blob/master/actor_net.py#L62 ?