ikostrikov / pytorch-a2c-ppo-acktr-gail

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
MIT License
3.57k stars 829 forks source link

How to use pixel training instead of low-dimensional state? #69

Closed knn1989 closed 6 years ago

ikostrikov commented 6 years ago

You need to use a wrapper.

For example see: https://github.com/emansim/acktr/blob/master/rgb_env.py

or

https://github.com/deepmind/dm_control/blob/a8112e730ed109c7b21a296f5cb1402bfb0bbcee/dm_control/suite/wrappers/pixels.py#L32

for dm_control suite.

knn1989 commented 6 years ago

Hi Ilya,

Thank you very much for answering my question.

That is what I tried (using rgb_env.py from emansim's github). However, I got this following error:

####### WARNING: All rewards are clipped or normalized so you need to use a monitor (see envs.py) or visdom plot to get true rewards ####### [2018-04-26 18:10:26,435] Making new env: Reacher-v1 Traceback (most recent call last): File "main.py", line 205, in main() File "main.py", line 111, in main obs = envs.reset() File "/home/knn1989/PycharmProjects/Mar_9_2018/baselines-master/baselines/common/vec_env/dummy_vec_env.py", line 21, in reset results = [env.reset() for env in self.envs] File "/home/knn1989/PycharmProjects/Mar_9_2018/baselines-master/baselines/common/vec_env/dummy_vec_env.py", line 21, in results = [env.reset() for env in self.envs] File "/home/knn1989/PycharmProjects/Mar_9_2018/gym/gym/core.py", line 104, in reset return self._reset() File "/home/knn1989/PycharmProjects/Mar_9_2018/gym/gym/core.py", line 312, in _reset return self._observation(observation) File "/home/knn1989/PycharmProjects/Mar_9_2018/gym/gym/core.py", line 322, in _observation raise NotImplementedError NotImplementedError

ikostrikov commented 6 years ago

It's probably because they changed something in gym. See how the interface for envs is implemented now.

knn1989 commented 6 years ago

Just updated your latest code. Still doesn't work. Though, it works fine with low-dim state. Very nice implementation, BTW.

knn1989 commented 6 years ago

I fixed it. Turn out that I used an older version of gym so it didn't work. I had to update to the newer version and then do some modifications to the CNNpolicy model.

ikostrikov commented 6 years ago

Great! In this case I'm closing the issue.