sweetice / Deep-reinforcement-learning-with-pytorch

PyTorch implementation of DQN, AC, ACER, A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and ....
MIT License
3.88k stars 844 forks source link

Confused about different action_sample way in SAC #11

Open ocean1211 opened 5 years ago

ocean1211 commented 5 years ago

I notice in SAC, the function select_action(), function sample() is simply used to randomly sample an "action", But in function evaluate(), the code is written as batch_mu + batch_sigma*z

Why don't just use sample() as the first one ? Is there any important differences?