Closed zhixy closed 7 years ago
In this repo, a deterministic policy is used, however, in the A3C paper, it is a stochastic policy (hope that I don't misunderstand), any reason for that?
This is not deterministic as action is sampled from distribution returned by network.
Yes, I just noticed that, and coming to close this issue...Anyway, thx!
In this repo, a deterministic policy is used, however, in the A3C paper, it is a stochastic policy (hope that I don't misunderstand), any reason for that?