Open huangjiancong1 opened 5 years ago
The test() function disable exploration for testing the performance of deterministic policies. 0.0.monitor.csv contains training performance (with exploration) 0.1.monitor.csv contains testing performance (without exploration)
Why we use
test()
to see the reward and testing_sample_step?Can I use the
train()
to see how the reward change when training? It seems that thelast perf
is the reward.Because we want to compare with the openai,