Closed fanshi14 closed 4 years ago
Hi Fanshi. This is actually arguable. For publication purposes, you should evaluate stochastic policies. In our applications of interest, you maybe want to evaluate deterministic policies. I'll leave this deterministic for now. You can modify your runner file to get the desired behavior.
In test mode, action should be deterministic, otherwise the action will be random with sampling