hongzimao / decima-sim

Learning Scheduling Algorithms for Data Processing Clusters
https://web.mit.edu/decima/
290 stars 90 forks source link

a question about loss founction #13

Open CookieYo opened 4 years ago

CookieYo commented 4 years ago

Hi, Here are two part loss in actor agent : adv loss and entropy loss, can you tell me why you add the entropy loss? I know the entropy weight decreased from 1 to 0.0001, but I do not know why you need entropy loss.

thank you! Liu

hongzimao commented 4 years ago

Entropy loss is for promoting exploration in RL. Large entropy means the action probability distribution is more spread-out. The agent would then try different trajectories (hence more exploration). Decaying the entropy factor during training is to let the agent converge its policy (i.e., more and more certain about its action choice). You can refer to https://arxiv.org/pdf/1602.01783.pdf (Section 4, Asynchronous advantage actor-critic, entropy paragraph) for more principles behind the entropy loss. Hope this helps!