openai / gym

A toolkit for developing and comparing reinforcement learning algorithms.
https://www.gymlibrary.dev
Other
34.47k stars 8.59k forks source link

"clone_state()" for Atari games actually includes pseudorandomness. #1017

Closed YuhangSong closed 5 years ago

YuhangSong commented 6 years ago

I have been testing the stochasticity of the Atari games. And found that both "clone_state()" and "clone_full_state()" for Atari games actually includes pseudorandomness. However, the codes in Gym says:

    def clone_state(self):
        """Clone emulator state w/o system state. Restoring this state will
        *not* give an identical environment. For complete cloning and restoring
        of the full state, see `{clone,restore}_full_state()`."""
        state_ref = self.ale.cloneState()
        state = self.ale.encodeState(state_ref)
        self.ale.deleteState(state_ref)
        return state

    def restore_state(self, state):
        """Restore emulator state w/o system state."""
        state_ref = self.ale.decodeState(state)
        self.ale.restoreState(state_ref)
        self.ale.deleteState(state_ref)

    def clone_full_state(self):
        """Clone emulator state w/ system state including pseudorandomness.
        Restoring this state will give an identical environment."""
        state_ref = self.ale.cloneSystemState()
        state = self.ale.encodeState(state_ref)
        self.ale.deleteState(state_ref)
        return state

    def restore_full_state(self, state):
        """Restore emulator state w/ system state including pseudorandomness."""
        state_ref = self.ale.decodeState(state)
        self.ale.restoreSystemState(state_ref)
        self.ale.deleteState(state_ref)

Please note that we cannot call seed seed() after calling restore_state(), because seed() will call loadROM() after reset the seed, which means restore_state() will be overwrite by loadROM(). The seed() method is pasted in the following:

    def _seed(self, seed=None):
        self.np_random, seed1 = seeding.np_random(seed)
        # Derive a random seed. This gets passed as a uint, but gets
        # checked as an int elsewhere, so we need to keep it below
        # 2**31.
        seed2 = seeding.hash_seed(seed1 + 1) % 2**31
        # Empirically, we need to seed before loading the ROM.
        self.ale.setInt(b'random_seed', seed2)
        self.ale.loadROM(self.game_path)
        return [seed1, seed2]

I have also tried calling self.ale.setInt(b'random_seed', seed2) directly after calling restore_state() to change the seed of the ale object, but the pseudorandomness remains (I still get deterministic results).

Thus, it seems that the seed of an Atari environment is fixed after the ROM is loaded, and cannot be modified between two reset()s. And the comments for clone_state is wrong?

I am pasting my code for testing the stochasticity of the Atari games in the following, please feel free to run the code and have a try to reset the seed after calling restore_state().

import copy
import glob
import os
import time

import gym
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable
import cv2
import subprocess
import torch.autograd as autograd
from gym import logger
import logging
import pickle as pickle
logger.setLevel(logging.WARNING)

bunch = 20
sequence = 50

def main():
    result = {
        'name':[],
        'grouped_num':[],
        'distribution':[],
    }
    game_list = ['air_raid', 'alien', 'amidar', 'assault', 'asterix', 'asteroids', 'atlantis']
    # game_list = ['bank_heist', 'battle_zone', 'beam_rider', 'berzerk', 'bowling', 'boxing', 'breakout', 'carnival']
    # game_list = ['centipede', 'chopper_command', 'crazy_climber', 'demon_attack', 'double_dunk']
    # game_list = ['elevator_action', 'enduro', 'fishing_derby', 'freeway', 'frostbite', 'gopher', 'gravitar']
    # game_list = ['hero', 'ice_hockey', 'jamesbond', 'journey_escape', 'kangaroo', 'krull', 'kung_fu_master']
    # game_list = ['montezuma_revenge', 'ms_pacman', 'name_this_game', 'phoenix', 'pitfall', 'pong', 'pooyan']
    # game_list = ['private_eye', 'qbert', 'riverraid', 'road_runner', 'robotank', 'seaquest', 'skiing']
    # game_list = ['solaris', 'space_invaders', 'star_gunner', 'tennis', 'time_pilot', 'tutankham', 'up_n_down']
    # game_list = ['venture', 'video_pinball', 'wizard_of_wor', 'yars_revenge', 'zaxxon']

    for game in game_list:

        '''get the name of the game'''
        name = ''.join([g.capitalize() for g in game.split('_')])
        env_name = '{}NoFrameskip-v4'.format(name)

        env_father = gym.make(env_name)
        env_father.reset()
        state_after_reset = env_father.unwrapped.clone_state()

        '''generate a sequence of actions'''
        action_sequence = np.random.randint(
            env_father.action_space.n,
            size = sequence,
        )

        bunch_obs = []
        distribution = []
        samples = []
        for bunch_i in range(bunch):

            env_temp = gym.make(env_name)
            env_temp.reset()
            env_temp.unwrapped.restore_state(
                state_after_reset
            )

            for sequence_i in range(sequence):
                obs, reward, done, info = env_temp.step(action_sequence[sequence_i])

            samples += [obs]
            found_at_bunch = -1
            if_has_identical_one = False
            max_value = 0
            for bunch_obs_i in range(len(bunch_obs)):
                obs_in_bunch = bunch_obs[bunch_obs_i]
                max_value = np.max(
                    np.abs(
                        obs-obs_in_bunch
                    )
                )
                if max_value < 1:
                    found_at_bunch = bunch_obs_i
                    if_has_identical_one = True
                    distribution[found_at_bunch] += 1
                    break

            if if_has_identical_one is False:
                bunch_obs += [obs]
                distribution += [1]

        grouped_num = len(bunch_obs)
        result['name'] += [name]
        result['grouped_num'] += [grouped_num]
        result['distribution'] += [distribution]
        print('game:{} grouped_num:{} distribution:{}'.format(
            name,
            grouped_num,
            distribution,
        ))

    print()
    for print_tmp in result['name']:
        print(print_tmp)
    print()
    for print_tmp in result['grouped_num']:
        print(print_tmp)

if __name__ == "__main__":
    main()

Many thanks!

AdrienLE commented 5 years ago

Contrary to what the ability to set a "random seed" seems to imply, Atari does not provide an API to get randomness from the hardware. Rather the vast majority of games obtain their random seed from the user's early actions.

Another way for the game to seed their randomness is to take advantage of the fact that the initial RAM and register state up starting up the Atari 2600 is (somewhat) random. This seems to be used by games like Solaris. As far as I can tell, the initial random state of the RAM and registers is the primary effect of the random_seed attribute in the ALE. This means that the randomness is already entirely contained in the RAM and registers as soon as you are able to call clone_state, and clone_state does clone the RAM and register.

Note that Stella (the underlying emulator) does have a random number generator and that its state is included when calling clone_full_state and not included when calling clone_state... The only problem is that by the time you can call clone_state it is nearly impossible that the rng will be called again in a way that meaningfully impacts the games.

If you search for randGenerator in the Stella codebase (https://github.com/stella-emu/stella/search?p=1&q=randgenerator&unscoped_q=randgenerator), you'll find that almost all of its uses are to initialize the RAM and registers and all uses outside of that seem highly unlikely to actually be called.

YuhangSong commented 5 years ago

Thanks a lot. It's been very helpful!