nottombrown / rl-teacher

Code for Deep RL from Human Preferences [Christiano et al]. Plus a webapp for collecting human feedback
MIT License
559 stars 95 forks source link

Allow running of unmodified envs with original `done` signals #27

Closed garymcintire closed 7 years ago

garymcintire commented 7 years ago

I try this and watch the movies

python -u rl_teacher/teach.py -p rl -e Humanoid-v1 -n base-rl -w 12

It always runs the full 1000 steps. Putting in a print statement in rollouts.py shows that the env.step never returns a 'done'

Is it supposed to be like this? If so, why?

nottombrown commented 7 years ago

Hey Gary, as in Deep RL from Human Preferences, we remove the done signals.

You can see the envs.py file for details.

I'd be interested in accepting PRs that make it easy to run the unmodified environments as well as the modified ones.

See the following issue: https://github.com/nottombrown/rl-teacher/issues/5

garymcintire commented 7 years ago

Thanks for clarifying

nottombrown commented 7 years ago

I'm leaving this open because it's a separate issue from #5

nottombrown commented 7 years ago

Ah, actually this is already an open issue. Closing in favor of #12