k4ntz / OC_Atari

Object Centric Atari games
MIT License
48 stars 11 forks source link

Object-Centric Atari Environments

Quentin Delfosse, Jannis Blüml, Bjarne Gregori, Sebastian Sztwiertnia

Inspired by the work of Anand et. al., we present OCAtari, an improved, extended and object-centric version of their ATARI ARI project. \ The Arcade Learning Environment allows us to read the RAM state at any time of a game. This repository provides a wrapper for the well known Gynmasium project, that uses the state of the ram and reverse engineering to provide object centric representation of the screen. It provides code for benchmarking, testing and generating object-centric representations of states.

:heavy_exclamation_mark: HERE IS A LINK TO THE DOCUMENTATION :bookmark_tabs:


Structure of this repository

This repository is structured into multiple folder:

Install

You can install OCAtari in multiple ways, the recommended is to use the provided Dockerfile to install all requirements, like the Atari ROMs and gymnasium. You can also simply: pip install ocatari pip install "gymnasium[atari, accept-rom-license]" If you want to modify the code, you can clone this repo and run: python setup.py install or if you want to modify the code python setup.py develop

Usage

To use the OCAtari environments:

from ocatari.core import OCAtari
import random

env = OCAtari("Pong", mode="ram", hud=True, render_mode="rgb_array")
observation, info = env.reset()
action = random.randint(0, env.nb_actions-1)
obs, reward, terminated, truncated, info = env.step(action)

Cite OCAtari:

If you are using OCAtari for your scientific pubplease use 'ram' mode insteadlications, please cite us:

@inproceedings{Delfosse2023OCAtariOA,
title={OCAtari: Object-Centric Atari 2600 Reinforcement Learning Environments},
author={Quentin Delfosse and Jannis Blüml and Bjarne Gregori and Sebastian Sztwiertnia and Kristian Kersting},
year={2023}
}

List of covered games

Alien Amidar Assault Asterix Asteroids Atlantis BattleZone BankHeist BeamR. Berzerk Bowling Boxing Breakout Carnival Centipede ChopperC. CrazyC. DemonA. DonkeyK. FishingD. Freeway Frostbite Gopher Hero IceHockey Jamesbond Kangaroo Krull Montezum. MsPacman Pacman Pitfall Pong PrivateE. Q*Bert RiverRaid RoadR. Seaquest Skiing SpaceInv. Tennis TimePilot UpNDown Venture VideoP. YarsR.

A list of all gymnasium games can be found in the Gymnasium Documentation

The two modes of OCAtari

OCAtari supports two different modes to extract objects from the current state:

Vision Extraction Mode (VEM): Return a list of objects currently on the screen with their X, Y, Width, Height, R, G, B Values, based on handwritten rules used on the visual representation.

Ram Extraction Mode (REM): Uses the object values stored in the RAM to detect the objects currently on the screen.

Use these trained agents and the demo script:

A better example how to run OCAtari is given with our demo files showing you how to run each game with a provided agent.

Use the demo files in the scripts/demo folder to test it yourself. You can set the mode to 'raw', 'vision' or 'revised' in line 10 of the demo script. You can also run the demo file with an already trained agent or your own developed agent. You can use the -p flag in the command to run the demo file by an agent and let the agent play the game. Here is an example:

python demo_pong.py -p models/Pong/model_50000000.gz

More information can be found in this ReadMe

Extract the objects from a state

With env.objects one can access the list of objects found in the current state. Note that these lists can differ depending on the mode you used initiating the environment

Producing your own dataset

OCAtari can be used to generate datasets consisting of a represenation of the current state in form of an RGB array and a list of all objects within the state. More information can be found in the dataset_generation folder.

Models and additional Information

As trained agents as well as to reproduce our results, we recomment to use the agents of this repo.

Reproducing our results

In most of our scripts we added the following line to make the deterministic and easier to reproduce make_deterministic(env, 42). This line can and should be removed if this is not the desired behavior. As seeds we used 0 for evaluating the metrics and 42 for generating the dataset.