stevenpjg / ddpg-aigym

Continuous control with deep reinforcement learning - Deep Deterministic Policy Gradient (DDPG) algorithm implemented in OpenAI Gym environments
MIT License
275 stars 74 forks source link

Need help to understand a step #2

Closed sarvghotra closed 8 years ago

sarvghotra commented 8 years ago

Could you please explain this 3 in this line https://github.com/stevenpjg/ddpg-aigym/blob/master/actor_net.py#L62 ?

stevenpjg commented 8 years ago

tanh function at the output layer returns a value from -1 to +1 . The action_space of the specific environment used for testing (inverted pendulum) ranges from -3 to +3 . So, 3 was multiplied with the final tanh ouput layer.

Now, I have generalized with a action_bound variable