a question about loss founction

Entropy loss is for promoting exploration in RL. Large entropy means the action probability distribution is more spread-out. The agent would then try different trajectories (hence more exploration). Decaying the entropy factor during training is to let the agent converge its policy (i.e., more and more certain about its action choice). You can refer to https://arxiv.org/pdf/1602.01783.pdf (Section 4, Asynchronous advantage actor-critic, entropy paragraph) for more principles behind the entropy loss. Hope this helps!

hongzimao / decima-sim

a question about loss founction #13