Closed will-maclean closed 3 weeks ago
evaluate_policy shouldn't do e.g. action sampling or epsilong greedy. Instead, it should be deterministic/greedy.
evaluate_policy already uses deterministic actions
evaluate_policy shouldn't do e.g. action sampling or epsilong greedy. Instead, it should be deterministic/greedy.