openai / baselines

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
MIT License
15.63k stars 4.86k forks source link

No enjoy.py for acktr #115

Open Bleyddyn opened 7 years ago

Bleyddyn commented 7 years ago

I added logger.configure() to run_atari.py then trained a model.

I copied make_model.pkl and one of the checkpoint files to the acktr directory then tried to write something like one of the deepq enjoy scripts so I could see how my model was playing.

Below is my code and the error messages I'm seeing.

Any suggestions would be appreciated.

import gym
import cloudpickle
import os.path as osp
import numpy as np

import baselines.acktr.acktr_disc

def update_obs(state, obs):
    state = np.roll(state, shift=-1, axis=3)
    state[:, :, :, -1] = obs
    return state

def main():
    nstack = 4

    env = gym.make("BreakoutNoFrameskip-v4")

    nh, nw, nc = env.observation_space.shape
    state = np.zeros((nh, nw, nc, nstack), dtype=np.uint8)

    #obs = env.reset()
    #test = update_obs(state,obs)

    with open('make_model.pkl', 'rb') as fh:
        make_model = cloudpickle.load(fh)
    model = make_model()
    act = model.load("checkpoint01200")

    if not act:
        print("Failed to load model checkpoint")
        return

    while True:
        obs, done = env.reset(), False
        episode_rew = 0
        while not done:
            env.render()
            state = update_obs(state,obs)
            obs, rew, done, _ = env.step(act(state)[0])
            episode_rew += rew
        print("Episode reward", episode_rew)

if __name__ == '__main__':
    main()

$ python3 ./enjoy_breakout.py [2017-08-24 14:23:27,095] Making new env: BreakoutNoFrameskip-v4 2017-08-24 14:23:27.282739: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. 2017-08-24 14:23:27.282776: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. Use async eigen decomp updating 34 eigenvalue/vectors projecting 12 gradient matrices 2017-08-24 14:23:31.711684: I tensorflow/core/common_runtime/simple_placer.cc:697] Ignoring device specification /job:localhost/replica:0/task:0/device:CPU:0 for node 'fifo_queue_Close_1' because the input edge from 'fifo_queue' is a reference connection and already has a device field set to /job:localhost/replica:0/task:0/device:GPU:0 2017-08-24 14:23:31.711727: I tensorflow/core/common_runtime/simple_placer.cc:697] Ignoring device specification /job:localhost/replica:0/task:0/device:CPU:0 for node 'fifo_queue_Close' because the input edge from 'fifo_queue' is a reference connection and already has a device field set to /job:localhost/replica:0/task:0/device:GPU:0 2017-08-24 14:23:31.711745: I tensorflow/core/common_runtime/simple_placer.cc:697] Ignoring device specification /job:localhost/replica:0/task:0/device:CPU:0 for node 'cond_1/fifo_queue_enqueue' because the input edge from 'cond_1/fifo_queue_enqueue/Switch' is a reference connection and already has a device field set to /job:localhost/replica:0/task:0/device:GPU:0 2017-08-24 14:23:31.711778: I tensorflow/core/common_runtime/simple_placer.cc:697] Ignoring device specification /job:localhost/replica:0/task:0/device:CPU:0 for node 'cond_2/fifo_queue_Size' because the input edge from 'cond_2/fifo_queue_Size/Switch' is a reference connection and already has a device field set to /job:localhost/replica:0/task:0/device:GPU:0 2017-08-24 14:23:31.711792: I tensorflow/core/common_runtime/simple_placer.cc:697] Ignoring device specification /job:localhost/replica:0/task:0/device:CPU:0 for node 'cond_2/cond/fifo_queue_Dequeue' because the input edge from 'cond_2/cond/fifo_queue_Dequeue/Switch' is a reference connection and already has a device field set to /job:localhost/replica:0/task:0/device:GPU:0 Failed to load model checkpoint

Mac OS X 10.9.5 Python 3.6.2 baselines is up-to-date Tensorflow 1.3.0

Bleyddyn commented 7 years ago

I added a pull request with partially working code: https://github.com/openai/baselines/pull/122

phuongho43 commented 6 years ago

Were you or anyone able to create a fully working enjoy.py for acktr?

Bleyddyn commented 6 years ago

@phuongho43 I haven't looked at this in a long time. You can have a look at the latest version of my script at: https://github.com/Bleyddyn/baselines/blob/master/baselines/acktr/enjoy_acktr.py

I just tried it and it does look like it's running. However the render option doesn't work, so I can't watch what it's doing, just see the average scores.