Closed matthew-hsr closed 4 years ago
You can easily change the default activation function by passing this argument for example:
policy_kwargs = dict(act_fun=tf.nn.tanh, net_arch=[32, 32])
Some papers mention that relu causes more issues than tanh when used outside of simulation:
"We implemented the policy with an MLP with two hidden layers, with 256 and 128 units each and tanh nonlinearity (Fig. 5). We found that the nonlinearity has a strong effect on performance on the physical system. Performance of two trained policies with different activation functions can be very different in the real world even when they perform similarly in simulation. Our explanation is that unbounded activation functions, such as ReLU, can degrade performance on the real robot, since actions can have very high magnitude when the robot reaches states that were not visited during training. Bounded activation functions, such as tanh, yield less aggressive trajectories when subjected to disturbances"
Source: Learning Agile and Dynamic Motor Skills for Legged Robots, HWANGBO et. al. about training their four-legged robot ANYmal.
Personally, I tried both activation functions in simulation, and I did not notice any practical training time or performance difference. It might be because the networks used for robotics are small compared to those used for audio or text processing.
Is there any particular situation in which tanh is superior to, say, relu?
This comes from hyperparameter optimization, you have a comparison here. As @charles-blouin mentioned, you can easily try to change the activation function.
Btw, tanh
is the default for A2C, ACER, PPO, TRPO but relu
is the default for SAC
, DDPG
and TD3
.
deep networks?
Most networks in RL are shallow (e.g. 2 fully connected layers in continuous action setting), so it does not make much difference.
Thanks a lot!
It seems that the default activation function for mlp policy is set to be
tf.tanh
(e.g. inclass FeedForwardPolicy
andclass LstmPolicy
in policies.py.Correct me if I'm wrong, but isn't
tanh
well known for suffering from expensive calculation cost and vanishing gradient problem for deep networks? Is this default activation an informed choice for reinforcement learning algorithm or is it just randomly picked? Is there any particular situation in whichtanh
is superior to, say,relu
?Thanks in advance!
(If you have time, can you answer this quick question too?)