miyosuda / unreal

Reinforcement learning with unsupervised auxiliary tasks
Other
416 stars 130 forks source link

the score is not reset to 0 when episode terminates #9

Closed NoobFang closed 7 years ago

NoobFang commented 7 years ago

In 'nav_maze_static_01' environment, each apple is accounted as 1 point and the final target as 10 points. So the 80+ score is not very reasonable. And by running the display process, I observe that the score is not reset to 0 when some episodes terminate.

Is this normal? Or designed for some reason?

miyosuda commented 7 years ago

The episode does not end when agent reaches +10 warp point, but the episode ends only when max time (60 seconds = 3600 frames) is passed.

https://github.com/deepmind/lab/blob/master/assets/game_scripts/nav_maze_static_01.lua#L5

The agent calculates 4 frames at one step, so one episode ends at 900 steps. (I mean, env.is_running() becomes False only after 900 steps are passed.)

NoobFang commented 7 years ago

OK, I got it. Thanks for your reply! I'll close this issue.