sunghoonhong / L2RPN-WCCI-2020-Winner

Mozilla Public License 2.0
8 stars 0 forks source link

sample and mean func in actor object #1

Closed 4thfever closed 3 years ago

4thfever commented 3 years ago

Hi,

Thanks for your fantastic job and sharing.

I am wondering what is the meaning under the actor's sample() and mean()? I read your paper but didn't find any explanation.

Thanks in advance.

sunghoonhong commented 3 years ago

Hi,

Thanks for your interest.

As in most RL algorithms (especially Soft Actor-Critic), our agent basically learns stochastic policy (i.e. learn gaussian policy using sample()) in training phase, whereas it uses deterministic policy (i.e. decide actions only by gaussian mean using mean()) in evaluation phase.

This kind of strategy is broadly used in RL algorithms due to the stability of evaluation performance.

Bests,

4thfever commented 3 years ago

Thanks!