Closed JawwadF closed 6 years ago
Also, I can confirm that training with the original reward function from the environment works fine.
Command used:
python3 run.py train_policy_with_original_rewards MovingDotNoFrameskip-v0 --million_timesteps 0.25
Ah, I'd made a change to easy-tf-log that had broken its fork-safety, so it was hanging when trying to flush events. Fixed now - either fetch an updated copy of Pipfile
and do pipenv install
, or just do pip uninstall easy-tf-log; pip install easy-tf-log==1.1
inside the virtualenv.
It seems to be collecting synthetic preferences now! Thank you, I'll have to check whether it's learning - I'll get back to you to confirm.
Thanks for fixing the issue!
Assuming this is fixed.
I tried to train using synthetic preferences in the MovingDot environment using the command:
However, it doesn't appear to be learning, and I continuously get the output
Do you know what might be going on here?
(Also great job on this project!)