openai / baselines

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
MIT License
15.63k stars 4.86k forks source link

Custom environnment #628

Open Nicolas99-9 opened 5 years ago

Nicolas99-9 commented 5 years ago

I tried to run ppo with my custom environment. I created the environment using the rendering engine from OpenAI.

When I start to launch the model, I have this error (after the environment creation):

python3: ../../src/xcb_in.c:671: xcb_request_check: Assertion!reply' failed. [DL-Box:08007] Process received signal [DL-Box:08007] Signal: Aborted (6) [DL-Box:08007] Signal code: (-6) [DL-Box:08007] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10330) [0x7f437f892330] [DL-Box:08007] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37) [0x7f437ede2c37] [DL-Box:08007] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x148) [0x7f437ede6028] [DL-Box:08007] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x2fbf6) [0x7f437eddbbf6] [DL-Box:08007] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x2fca2) [0x7f437eddbca2] [DL-Box:08007] [ 5] /usr/lib/x86_64-linux-gnu/libxcb.so.1(+0xb60c) [0x7f42ec1ae60c] [DL-Box:08007] [ 6] /usr/lib/x86_64-linux-gnu/mesa/libGL.so.1(+0x199d2) [0x7f42ebd509d2] [DL-Box:08007] [ 7] /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7f43685b0c7c] [DL-Box:08007] [ 8] /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x1fc) [0x7f43685b05ac] [DL-Box:08007] [ 9] /usr/local/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(_ctypes_callproc+0x293) [0x7f43687e88b3] [DL-Box:08007] [10] /usr/local/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(+0x8e0f) [0x7f43687dfe0f] [DL-Box:08007] [11] python3(_PyObject_FastCallDict+0xa2) [0x450b62] [DL-Box:08007] [12] python3() [0x540445] [DL-Box:08007] [13] python3(_PyEval_EvalFrameDefault+0x3e6c) [0x54552c] [DL-Box:08007] [14] python3() [0x53f681] [DL-Box:08007] [15] python3() [0x540817] [DL-Box:08007] [16] python3(_PyEval_EvalFrameDefault+0x3e6c) [0x54552c] [DL-Box:08007] [17] python3() [0x5402f1] [DL-Box:08007] [18] python3() [0x5405ef] [DL-Box:08007] [19] python3(_PyEval_EvalFrameDefault+0x3e6c) [0x54552c] [DL-Box:08007] [20] python3() [0x5402f1] [DL-Box:08007] [21] python3(_PyFunction_FastCallDict+0x156) [0x5494f6] [DL-Box:08007] [22] python3(_PyObject_FastCallDict+0x1ef) [0x450caf] [DL-Box:08007] [23] python3(_PyObject_Call_Prepend+0xcb) [0x450dab] [DL-Box:08007] [24] python3(PyObject_Call+0x60) [0x450980] [DL-Box:08007] [25] python3() [0x4c750b] [DL-Box:08007] [26] python3() [0x4bec6a] [DL-Box:08007] [27] python3(_PyObject_FastCallDict+0xa2) [0x450b62] [DL-Box:08007] [28] python3() [0x540445] [DL-Box:08007] [29] python3(_PyEval_EvalFrameDefault+0x3e6c) [0x54552c] [DL-Box:08007] End of error message

`

pzhokhov commented 5 years ago

Hi @Nicolas99-9 ! Coud you provide more info please? Maybe a snippet of the rendering code; and system info? On the surface, that looks like X11 problem - kinda like if you were to try rendering on a headless machine (i.e. without graphics driver running - maybe a cloud instance with ssh access only) If that's the case, you'll need to setup a mockup display (here's an example of how we do that in gym to test the environments that rely on opengl for rendering): https://github.com/openai/gym/blob/a9d7fc7dd1b08e0145b9465c947daf76a4f2c411/bin/docker_entrypoint#L13

Nicolas99-9 commented 5 years ago

Here is the env I'm using, it's quite simple, a gridworld environment is some objects/shape.

import time
import collections
import operator
import random
import unittest
from enum import Enum

import numpy as np

import gym
from gym import spaces
from gym.envs.classic_control import rendering

class Dir(Enum):
    NORTH = (0, 1)
    EAST = (1, 0)
    SOUTH = (0, -1)
    WEST = (-1, 0)

class Event(Enum):
    DEAD = 0
    EAT = 1
    NEW_FOOD = 2
    ADD = 3
    REMOVE = 4
    WIN = 5
    DEAD_INTERNAL = 6

class SnakeEnv(gym.Env):
    """An environment for snake

    Arguments:
        game (SnakeGame, optional): If not supplied, a SnakeGame with some
            sane defaults is used.
    """
    metadata = {
        'render.modes': ['human', 'rgb_array'],
        'video.frames_per_second' : 40
    }
    def __init__(self, game=None):
        self.game = game
        if self.game is None:
            self.game = SnakeGame()
        self.action_space = spaces.Discrete(4)
        self.observation_space = spaces.Box(
            low=0,
            high=2,
            shape=(self.game.height, self.game.width, 1),
            dtype=np.uint8
        )
        self.viewer = None

    def step(self, action):
        """Steps the snake game.

        Arguments:
            action (int, 0-3 inclusive): whether to move the snake north,
                east, south, or west (respectively).
        """
        if self.game.game_over:
            return (None, 0, True, {})
        assert self.action_space.contains(action), "invalid action"
        if action == 0:
            direction = Dir.NORTH
        elif action == 1:
            direction = Dir.EAST
        elif action == 2:
            direction = Dir.SOUTH
        elif action == 3:
            direction = Dir.WEST
        events = self.game.step(direction)
        reward = 0
        for event in events:
            if self.viewer is not None:
                self.viewer.process_event(event)
            if event == Event.DEAD:
                reward = -5
            elif event[0] == Event.DEAD_INTERNAL:
                reward = -1
            elif event == Event.WIN:
                reward = 5
                self.game.game_over = True
            elif isinstance(event, tuple) and event[0] == Event.EAT:
                reward = 1
        return (self.render(), reward, self.game.game_over, {})

    def render(self, mode='rgb_array'):
        if self.viewer is None:
            self.viewer = SnakeViewer(self.game)
        #print("render in model " , mode)
        return self.viewer.render(mode)

    def reset(self,test=False):
        self.game.reset(test=test)
        if self.viewer is not None:
            self.viewer.reset()

        #return self.render()
        return self.game.board

    def close(self):
        if self.viewer is not None:
            self.viewer.close()
            self.viewer = None

class SnakeViewer(object):
    """A renderer for the snake game.
    Arguments:
        game (SnakeGame): the game to be rendered
        screen_width (int, default 600): the width of the display in pixels
        screen_height (int, default 600): the height of the display in pixels
        food_color (3-int tuple, default (0.4, 0.6, 0.8): the color of food
            squares.
        snake_color (3-int tuple, default (0.3, 0.3, 0.3): the color of snake
        maxDistance (int, default 5) number of square around that the agent can visualize
            squares.
    """
    def __init__(self, game, screen_width=200, screen_height=200,
            food_color=(0.4, 0.6, 0.8), snake_color=(0, 0, 0),maxDistance=5):
        self.game = game
        self.food_color = food_color
        self.snake_color = snake_color
        self.square_width = screen_width / self.game.width
        self.square_height = screen_height / self.game.height
        self.viewer = rendering.Viewer(screen_width, screen_height)
        self.food_color = food_color
        self.snake_color = snake_color
        self.wall_color = (0.29,1.0,0.40)
        self.cross_color = (0.81,0.18,0.18)
        facteur = 0.20
        facteur2 = 0.10
        self.shift_w = self.square_width*facteur
        self.shift_h = self.square_height*facteur
        self.shift_w2 = self.square_width*facteur2
        self.shift_h2 = self.square_height*facteur2
        self.reset()

    def reset(self):
        self.viewer.geoms = []
        self.snake_geoms = {}
        _, food_squares, wall_squares,self.crosses = self.game.get_state()
        self.food_geoms = {food_square:self.get_food(food_square, self.food_color) for food_square in food_squares}
        self.wall_geoms = {wall_square:self.get_wall(wall_square, self.wall_color) for wall_square in wall_squares}
        #self.crosses_geoms = {cross:self.get_cross(cross, self.cross_color) for cross in crosses}
        for food_geom in self.food_geoms:
            self.viewer.add_geom(self.food_geoms[food_geom])
        for wall_geom in self.wall_geoms:
            self.viewer.add_geom(self.wall_geoms[wall_geom])
        '''for crosses_geom in self.crosses_geoms:
            self.viewer.add_geom(self.crosses_geoms[crosses_geom])'''
        for snake_square in self.game.get_snake_squares():
            snake_geom = self.get_square(snake_square, self.snake_color)
            self.snake_geoms[snake_square] = snake_geom
            self.viewer.add_geom(snake_geom)
        self.draw_crosses()

    def get_square(self, square, color):
        square_x, square_y = square
        geom = rendering.FilledPolygon([
            (square_x*self.square_width+self.shift_w2, square_y*self.square_height+self.shift_h2),
            (square_x*self.square_width+self.shift_w2, (square_y+1)*self.square_height-self.shift_h2),
            ((square_x+1)*self.square_width-self.shift_w2, (square_y+1)*self.square_height-self.shift_h2),
            ((square_x+1)*self.square_width-self.shift_w2, square_y*self.square_height+self.shift_h2)
        ])
        #geom.add_attr(rendering.Transform(rotation=45))
        geom.set_color(*color)

        """square_x, square_y = square
        maxDIstance = 40
        geom = rendering.make_circle(radius=maxDIstance)
        poletrans = rendering.Transform(translation=((square_x+0.5)*self.square_width, (square_y+0.5)*self.square_height))
        geom.add_attr(poletrans)
        geom.set_color(*color)"""

        return geom

    def get_wall(self, square, color):
        square_x, square_y = square
        geom = rendering.FilledPolygon([
            (square_x*self.square_width+self.shift_w2, square_y*self.square_height+self.shift_h2),
            (square_x*self.square_width+self.shift_w2, (square_y+1)*self.square_height-self.shift_h2),
            ((square_x+1)*self.square_width-self.shift_w2, (square_y+1)*self.square_height-self.shift_h2),
            ((square_x+1)*self.square_width-self.shift_w2, square_y*self.square_height+self.shift_h2)
        ])
        geom.set_color(*color)
        return geom

    def get_cross(self, square, color):

        square_x, square_y = square
        geom = rendering.Line((square_x*self.square_width,square_y*self.square_height+self.square_height),(square_x*self.square_width+self.square_width,square_y*self.square_height))
        geom2 = rendering.Line((square_x*self.square_width,square_y*self.square_height),(square_x*self.square_width+self.square_width,square_y*self.square_height+self.square_height))
        shape = rendering.Compound([geom,geom2])
        shape.set_color(*color)
        #shape.LineWidth(5)
        #test = rendering.draw_polyline((square_x*self.square_width,square_y*self.square_height+self.square_height),(square_x*self.square_width+self.square_width,square_y*self.square_height))
        return shape

    def get_food(self, square, color):
        square_x, square_y = square
        geom = rendering.make_circle(radius=self.square_width/2.0-self.shift_w2)
        poletrans = rendering.Transform(translation=((square_x+0.5)*self.square_width, (square_y+0.5)*self.square_height))
        geom.add_attr(poletrans)
        geom.set_color(*color)
        return geom

    def process_event(self, event):
        if isinstance(event, tuple):
            if event[0] == Event.EAT:
                #print("EAAAAAAAAAAAAAAAA")
                self.viewer.geoms.remove(self.food_geoms[event[1]])
                self.food_geom = None
            elif event[0] == Event.NEW_FOOD:
                self.food_geom = self.get_square(event[1], self.food_color)
                self.viewer.add_geom(self.food_geom)
            elif event[0] == Event.ADD:
                square = self.get_square(event[1], self.snake_color)
                self.snake_geoms[event[1]] = square
                self.viewer.add_geom(square)
            elif event[0] == Event.DEAD_INTERNAL:
                self.viewer.geoms.remove(self.wall_geoms[event[1]])
                self.wall_geom = None
            elif event[0] == Event.REMOVE:
                if event[1] in self.snake_geoms:
                    square = self.snake_geoms[event[1]]
                    del self.snake_geoms[event[1]]
                    self.viewer.geoms.remove(square)
        self.draw_crosses()

    def draw_crosses(self):
        facteur= 0.20
        for square_x, square_y in self.crosses:
            self.viewer.draw_polyline([(square_x*self.square_width+self.shift_w,square_y*self.square_height+self.square_height-self.shift_h),(square_x*self.square_width+self.square_width-self.shift_w,square_y*self.square_height+self.shift_h)],linewidth=3,color=self.wall_color)
            self.viewer.draw_polyline([(square_x*self.square_width+self.shift_w,square_y*self.square_height+self.shift_h),(square_x*self.square_width+self.square_width-self.shift_w,square_y*self.square_height+self.square_height-self.shift_h)],linewidth=3,color=self.wall_color)
    def close(self):
        self.viewer.close()

    def render(self, mode='human'):
        return self.viewer.render(return_rgb_array = mode=='rgb_array')

class SnakeGame(object):
    """Class to capture state of snake game, as well as perform state
    transitions.

    Arguments:
        width (int, default 20): number of squares wide the board should be
        height (int, default 20) number of squares tall the board should be
    """
    def __init__(self, width=12, height=12, number_food=10, number_walls=10,maxDistance=5):
        assert width > 1 and height > 1
        self.width = width
        self.height = height
        self.number_food = number_food
        self.number_walls = number_walls
        self.reset()

    def reset(self, head=None, test=False):
        self.food_squares= []
        self.wall_squares= []
        self.crosses = []
        self.game_over = False
        self.current_dir = Dir.NORTH
        self.board = np.zeros((self.width, self.height), dtype=np.uint8)
        self.nb_food= self.number_food
        if not test:
            if head is None:
                head = (
                    random.randint(0, self.width-1),
                    random.randint(0, self.height-1)
                )
            self.board[head] = 1
            self.snake_squares = collections.deque([head])
            for _ in range(self.number_walls):
                self.wall_square = tuple(random.choice(np.argwhere(self.board == 0)))
                self.board[self.wall_square] = 99
                self.wall_squares.append(self.wall_square)
            for _ in range(self.number_food):
                self.food_square = tuple(random.choice(np.argwhere(self.board == 0)))
                self.board[self.food_square] = 2
                self.food_squares.append(self.food_square)
        else:
            head = (7,7)
            self.board[head] = 1
            self.snake_squares = collections.deque([head])
            k = np.array([tuple(c) for c in np.argwhere(self.board == 0)])
            np.random.seed(99)
            choices1 = (np.random.choice(len(k),size=10,))
            for indice in choices1:
                tmp = tuple(k[indice])
                self.board[tmp] = 99
                self.wall_squares.append(tmp)
            k = np.array([tuple(c) for c in np.argwhere(self.board == 0)])
            np.random.seed(999)
            choices2 = (np.random.choice(len(k),size=10,))
            for indice in choices2:
                tmp = tuple(k[indice])
                self.board[tmp] = 2
                self.food_squares.append(tmp)

    def get_state(self):
        head = self.snake_squares.pop()
        self.snake_squares.append(head)
        return (head, self.food_squares,self.wall_squares,self.crosses)

    def get_snake_squares(self):
        return map(tuple, np.argwhere(self.board == 1))

    def step(self, direction):
        if self.game_over:
            return []
        events = []
        # Check if going directly backward. If so, just step forward.
        old_head = self.snake_squares.pop()
        second_oldest_head = None
        if len(self.snake_squares) > 0:
            second_oldest_head = self.snake_squares.pop()
            self.snake_squares.append(second_oldest_head)
        self.snake_squares.append(old_head)
        new_head = tuple(map(operator.add, old_head, direction.value))
        if second_oldest_head == new_head:
            direction = self.current_dir
            new_head = tuple(map(operator.add, old_head, direction.value))

        if not (0 <= new_head[0] < self.width) or not (0 <= new_head[1] < self.height):
            self.game_over = True
            return [Event.DEAD]
        elif self.board[new_head] == 99:
            #self.game_over = True
            #return [Event.DEAD_INTERNAL]
            events.append((Event.DEAD_INTERNAL, new_head))
            self.wall_squares.remove(new_head)
        elif new_head in self.food_squares:
            events.append((Event.EAT, new_head))
            self.food_squares.remove(new_head)
            self.nb_food-=1
        else:
            tail = self.snake_squares.popleft()
            self.board[tail] = 0
            events.append((Event.REMOVE, tail))
        # Check if the new head is a valid spot AFTER making changes from
        # eating food.
        if self.board[new_head] == 1 or self.board[new_head] ==55:
            self.game_over = True
            #print("NEW HEAD",self.board[new_head])
            return [Event.DEAD]
        self.board[new_head] = 1
        tail = self.snake_squares.append(new_head)
        if self.nb_food ==0:
            return [Event.WIN]
        events.append((Event.ADD, new_head))
        if len(self.snake_squares) > 1:
            old_head= self.snake_squares.popleft()
            events.append((Event.REMOVE, old_head))
            self.board[old_head] = 0
        #print(events)
        return events

I run in on a local machine. I tried to run ppo2 with Atari and it works, so I guess there may have a problem with rendering.py from Openai ?

It seems, after calling (cmd_util.py):

if num_env > 1: return SubprocVecEnv([make_env(i + start_index) for i in range(num_env)]) the created environments have some problems. When I call reset() or render(), I have the error messages.

pzhokhov commented 5 years ago

I tried running your code (basically, pasted that code in a file, and then added

if __name__ == '__main__':
    snake = SnakeEnv()
    while True:
        ac = snake.action_space.sample()
        o, r, d, _ = snake.step(ac)
        snake.render()
        if d:
            snake.reset()

at the end), and it works fine (and shows a little env with a dot wandering around). Next, I tried running it in a VecEnv (with the code above replaced by):

if __name__ == '__main__':
    from baselines.common.vec_env.dummy_vec_env import DummyVecEnv
    snake_fn = lambda: SnakeEnv()
    venv = DummyVecEnv([snake_fn])

    while True:
        ac = venv.action_space.sample()
        venv.step(ac)
        venv.render()

and that failed with:

Traceback (most recent call last):
  File "nic_snake.py", line 374, in <module>
    venv.step(ac)
  File "/Users/peterz/dev/baselines/baselines/common/vec_env/__init__.py", line 100, in step
    return self.step_wait()
  File "/Users/peterz/dev/baselines/baselines/common/vec_env/dummy_vec_env.py", line 54, in step_wait
    self._save_obs(e, obs)
  File "/Users/peterz/dev/baselines/baselines/common/vec_env/dummy_vec_env.py", line 67, in _save_obs
    self.buf_obs[k][e] = obs
ValueError: could not broadcast input array from shape (400,400,3) into shape (12,12,1)
Exception ignored in: <bound method Viewer.__del__ of <gym.envs.classic_control.rendering.Viewer object at 0x110466e10>>
Traceback (most recent call last):
  File "/Users/peterz/venv/games/lib/python3.6/site-packages/gym/envs/classic_control/rendering.py", line 143, in __del__
  File "/Users/peterz/venv/games/lib/python3.6/site-packages/gym/envs/classic_control/rendering.py", line 62, in close
  File "/Users/peterz/venv/games/lib/python3.6/site-packages/pyglet/window/cocoa/__init__.py", line 281, in close
  File "/Users/peterz/venv/games/lib/python3.6/site-packages/pyglet/window/__init__.py", line 770, in close
ImportError: sys.meta_path is None, Python is likely shutting down

This happens because your env claims to have observation space 12x12x1, but in reality returns rendered image 400x400x3. If DummyVecEnv does not work, then SubprocVecEnv won't work either, but the error message there may be more arcane. Hope this helps!