mrahtz / learning-from-human-preferences

Reproduction of OpenAI and DeepMind's "Deep Reinforcement Learning from Human Preferences"
MIT License
304 stars 67 forks source link

The output is always waiting for preferences, 0 so far. #3

Open ZhanPython opened 5 years ago

ZhanPython commented 5 years ago

Hi, there. I tried to run it in train_policy_with_preferences mode. python3 run.py train_policy_with_preferences EnduroNoFrameskip-v4 --n_envs 16 --render_episodes After I run this command, I can't see two windows and the output is always Preference interface waiting for segments; Waiting for preferences; 0 so far. What is the problem?

mrahtz commented 5 years ago

Hey, I don't think I'll have time to debug this at the moment, but if you're willing to look into this yourself I'd be very happy to accept a pull request.

variablman commented 1 year ago

Hi, there. I tried to run it in train_policy_with_preferences mode. After I run this command, I can't see two windows and the output is always Preference interface waiting for segments; Waiting for preferences; 0 so far. What is the problem?python3 run.py train_policy_with_preferences EnduroNoFrameskip-v4 --n_envs 16 --render_episodes

have you success now?I met same problem

niuett commented 1 year ago

Hello, is this problem solved? I also ran into this issue, how should I do it